Over-sampling methods concentrate on creating balanced samples and have proven successful in classifying imbalanced data. However, current over-sampling methods fail to consider the uncertainty of produced samples, potentially altering the data distribution and impacting the classification process. To address this issue, we propose a distribution assessment-based multiple over-sampling (DAMO) method for classifying imbalanced data. We first introduce a multiple over-sampling method based on distribution assessment to create different forms of synthetic samples. The core is quantifying the inconsistency of data distribution before and after sampling as a constraint to guide multiple over-sampling, thereby minimizing the data shift and characterizing the uncertainty of produced samples. Then, we quantify the local reliability of the classification results and select several imprecise samples with low local reliability that are indistinguishable between classes. Neighbors serve as additional complementary information to calibrate the results of imprecise samples, thereby reducing the likelihood of misclassification. The calibrated results are combined by the discounting Dempster-Shafer fusion rule to make a final decision. DAMO's efficiency has been demonstrated through comparisons with related methods on various real imbalanced datasets.
扫码关注我们
求助内容:
应助结果提醒方式:
