Undersampling Near Decision Boundary for Imbalance Problems

2019 International Conference on Machine Learning and Cybernetics (ICMLC) Pub Date : 2019-07-01 DOI:10.1109/ICMLC48188.2019.8949290

Jianjun Zhang, Ting Wang, Wing W. Y. Ng, Shuai Zhang, C. Nugent

{"title":"Undersampling Near Decision Boundary for Imbalance Problems","authors":"Jianjun Zhang, Ting Wang, Wing W. Y. Ng, Shuai Zhang, C. Nugent","doi":"10.1109/ICMLC48188.2019.8949290","DOIUrl":null,"url":null,"abstract":"Undersampling the dataset to rebalance the class distribution is effective to handle class imbalance problems. However, randomly removing majority examples via a uniform distribution may lead to unnecessary information loss. This would result in performance deterioration of classifiers trained using this rebalanced dataset. On the other hand, examples have different sensitivities with respect to class imbalance. Higher sensitivity means that this example is more easily to be affected by class imbalance, which can be used to guide the selection of examples to rebalance the class distribution and to boost the classifier performance. Therefore, in this paper, we propose a novel undersampling method, the UnderSampling using Sensitivity (USS), based on sensitivity of each majority example. Examples with low sensitivities are noisy or safe examples while examples with high sensitivities are borderline examples. In USS, majority examples with higher sensitivities are more likely to be selected. Experiments on 20 datasets confirm the superiority of the USS against one baseline method and five resampling methods.","PeriodicalId":221349,"journal":{"name":"2019 International Conference on Machine Learning and Cybernetics (ICMLC)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"14","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 International Conference on Machine Learning and Cybernetics (ICMLC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICMLC48188.2019.8949290","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 14

Abstract

Undersampling the dataset to rebalance the class distribution is effective to handle class imbalance problems. However, randomly removing majority examples via a uniform distribution may lead to unnecessary information loss. This would result in performance deterioration of classifiers trained using this rebalanced dataset. On the other hand, examples have different sensitivities with respect to class imbalance. Higher sensitivity means that this example is more easily to be affected by class imbalance, which can be used to guide the selection of examples to rebalance the class distribution and to boost the classifier performance. Therefore, in this paper, we propose a novel undersampling method, the UnderSampling using Sensitivity (USS), based on sensitivity of each majority example. Examples with low sensitivities are noisy or safe examples while examples with high sensitivities are borderline examples. In USS, majority examples with higher sensitivities are more likely to be selected. Experiments on 20 datasets confirm the superiority of the USS against one baseline method and five resampling methods.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

不平衡问题的欠采样近决策边界

对数据集进行欠采样以重新平衡类分布是处理类不平衡问题的有效方法。然而，通过均匀分布随机去除大多数样本可能会导致不必要的信息损失。这将导致使用此重新平衡数据集训练的分类器的性能下降。另一方面，实例对于类不平衡有不同的敏感性。更高的灵敏度意味着这个例子更容易受到类不平衡的影响，可以用它来指导例子的选择，重新平衡类分布，提高分类器的性能。因此，在本文中，我们提出了一种新的欠采样方法，即基于每个多数样本的灵敏度的使用灵敏度的欠采样(USS)。低灵敏度的例子是有噪声的或安全的例子，而高灵敏度的例子是边缘例子。在USS中，大多数具有较高灵敏度的样本更有可能被选中。在20个数据集上的实验证实了该方法相对于一种基线方法和五种重采样方法的优越性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2019 International Conference on Machine Learning and Cybernetics (ICMLC)

自引率

0.00%

发文量