Imbalanced data classification using random forest with Ward clustering

  • Vo Thi Ngoc Ha
  • Nguyen Thanh Son
  • Dang Dang Khoa
  • Le Phuong Long
  • Phan Thi Thu Ngan

Abstract

This study introduces a Modified Balanced Random Forest algorithm to improve classification performance on imbalanced datasets. The proposed method enhances the Balanced Random Forest by applying a clustering based under sampling strategy during each bootstrap iteration. Four clustering methods were evaluated including K Means, Spectral Clustering, Agglomerative Clustering, and Ward Hierarchical Clustering. Among these, the Ward Hierarchical Clustering technique achieved the best performance. Experimental results show that the proposed method outperforms standard Random Forest and Balanced Random Forest, reaching a true positive rate of 93.42 percent, a true negative rate of 93.60 percent, and an area under the curve accuracy of 93.51 percent, while also reducing processing time. These results confirm the effectiveness of the proposed approach for imbalanced data classification.

điểm /   đánh giá
Published
2025-10-14