Undersampling Near Decision Boundary for Imbalance Problems

Jianjun Zhang, Ting Wang, Wing W. Y. Ng, Shuai Zhang, C. Nugent
{"title":"Undersampling Near Decision Boundary for Imbalance Problems","authors":"Jianjun Zhang, Ting Wang, Wing W. Y. Ng, Shuai Zhang, C. Nugent","doi":"10.1109/ICMLC48188.2019.8949290","DOIUrl":null,"url":null,"abstract":"Undersampling the dataset to rebalance the class distribution is effective to handle class imbalance problems. However, randomly removing majority examples via a uniform distribution may lead to unnecessary information loss. This would result in performance deterioration of classifiers trained using this rebalanced dataset. On the other hand, examples have different sensitivities with respect to class imbalance. Higher sensitivity means that this example is more easily to be affected by class imbalance, which can be used to guide the selection of examples to rebalance the class distribution and to boost the classifier performance. Therefore, in this paper, we propose a novel undersampling method, the UnderSampling using Sensitivity (USS), based on sensitivity of each majority example. Examples with low sensitivities are noisy or safe examples while examples with high sensitivities are borderline examples. In USS, majority examples with higher sensitivities are more likely to be selected. Experiments on 20 datasets confirm the superiority of the USS against one baseline method and five resampling methods.","PeriodicalId":221349,"journal":{"name":"2019 International Conference on Machine Learning and Cybernetics (ICMLC)","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2019-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"14","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 International Conference on Machine Learning and Cybernetics (ICMLC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICMLC48188.2019.8949290","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 14

Abstract

Undersampling the dataset to rebalance the class distribution is effective to handle class imbalance problems. However, randomly removing majority examples via a uniform distribution may lead to unnecessary information loss. This would result in performance deterioration of classifiers trained using this rebalanced dataset. On the other hand, examples have different sensitivities with respect to class imbalance. Higher sensitivity means that this example is more easily to be affected by class imbalance, which can be used to guide the selection of examples to rebalance the class distribution and to boost the classifier performance. Therefore, in this paper, we propose a novel undersampling method, the UnderSampling using Sensitivity (USS), based on sensitivity of each majority example. Examples with low sensitivities are noisy or safe examples while examples with high sensitivities are borderline examples. In USS, majority examples with higher sensitivities are more likely to be selected. Experiments on 20 datasets confirm the superiority of the USS against one baseline method and five resampling methods.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
不平衡问题的欠采样近决策边界
对数据集进行欠采样以重新平衡类分布是处理类不平衡问题的有效方法。然而,通过均匀分布随机去除大多数样本可能会导致不必要的信息损失。这将导致使用此重新平衡数据集训练的分类器的性能下降。另一方面,实例对于类不平衡有不同的敏感性。更高的灵敏度意味着这个例子更容易受到类不平衡的影响,可以用它来指导例子的选择,重新平衡类分布,提高分类器的性能。因此,在本文中,我们提出了一种新的欠采样方法,即基于每个多数样本的灵敏度的使用灵敏度的欠采样(USS)。低灵敏度的例子是有噪声的或安全的例子,而高灵敏度的例子是边缘例子。在USS中,大多数具有较高灵敏度的样本更有可能被选中。在20个数据集上的实验证实了该方法相对于一种基线方法和五种重采样方法的优越性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
An Empirical Study on the Classification of Chinese News Articles by Machine Learning and Deep Learning Techniques Posture Estimation Method Using Cushion Type Seat Pressure Sensor Advanced Convolutional Neural Network With Feedforward Inhibition Utilization of the Infrared Image Capturing Combustion State for Estimating the Steam Flow Aming to Stabilize Garbage Power Generation Domain Adaption for Facial Expression Recognition
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1