Imbalanced data classification using random forest with Ward clustering

  • Vo Thi Ngoc Ha
  • Nguyen Thanh Son
  • Dang Dang Khoa
  • Le Phuong Long
  • Phan Thi Thu Ngan
Từ khóa: Imbalanced Data; Random Forest; Balanced Random Forest; Classification Technique.

Tóm tắt

This study introduces a Modified Balanced Random Forest algorithm to improve classification performance on imbalanced datasets. The proposed method enhances the Balanced Random Forest by applying a clustering based under sampling strategy during each bootstrap iteration. Four clustering methods were evaluated including K Means, Spectral Clustering, Agglomerative Clustering, and Ward Hierarchical Clustering. Among these, the Ward Hierarchical Clustering technique achieved the best performance. Experimental results show that the proposed method outperforms standard Random Forest and Balanced Random Forest, reaching a true positive rate of 93.42 percent, a true negative rate of 93.60 percent, and an area under the curve accuracy of 93.51 percent, while also reducing processing time. These results confirm the effectiveness of the proposed approach for imbalanced data classification.

điểm /   đánh giá
Phát hành ngày
2025-10-14