Class imbalance in datasets presents a significant challenge in machine learning, often causing traditional classification algorithms to exhibit bias toward majority classes while underrepresenting minority classes, which may be of crucial importance in various applications. This classification challenge is further exacerbated by the presence of label noise, which impedes the identification of optimal decision boundaries between classes and potentially leads to model overfitting. While extensive research has addressed class imbalance and label noise as separate phenomena, there remains a notable gap in the literature regarding their concurrent occurrence in datasets, specifically in the domain of imbalanced classification with label noise (ICLN). This review aims to bridge this gap by conducting an extensive analysis of existing methodologies addressing ICLN challenges. Our review encompasses approaches across diverse categories, including resampling techniques, ensemble methods, cost-sensitive learning, deep learning, active learning, meta-learning, and hybrid methodologies. Through rigorous empirical evaluation, we compare representative methods from each category using synthetic and real-world datasets, revealing a trade-off between minority class preservation, noise robustness, and computational efficiency. Our findings reveal that algorithm effectiveness is fundamentally dataset-dependent, with deep learning methods excelling on complex datasets while resampling approaches achieve competitive performance with lower computational cost. Statistical significance analysis validates our empirical observations, and we identify concrete future research directions for advancing ICLN methodologies.
扫码关注我们
求助内容:
应助结果提醒方式:
