Adaptation Proposed Methods for Handling Imbalanced Datasets based on Over-Sampling Technique

Liqaa M. Shoohi, J. H. Saud
{"title":"Adaptation Proposed Methods for Handling Imbalanced Datasets based on Over-Sampling Technique","authors":"Liqaa M. Shoohi, J. H. Saud","doi":"10.23851/mjs.v31i2.740","DOIUrl":null,"url":null,"abstract":"Classification of imbalanced data is an important issue. Many algorithms have been developed for classification, such as Back Propagation (BP) neural networks, decision tree, Bayesian networks etc., and have been used repeatedly in many fields. These algorithms speak of the problem of imbalanced data, where there are situations that belong to more classes than others. Imbalanced data result in poor performance and bias to a class without other classes. In this paper, we proposed three techniques based on the Over-Sampling (O.S.) technique for processing imbalanced dataset and redistributing it and converting it into balanced dataset. These techniques are (Improved Synthetic Minority Over-Sampling Technique (Improved SMOTE),  Borderline-SMOTE + Imbalanced Ratio(IR), Adaptive Synthetic Sampling (ADASYN) +IR) Algorithm, where the work these techniques are generate the synthetic samples for the minority class to achieve balance between minority and majority classes and then calculate the IR between classes of minority and majority. Experimental results show ImprovedSMOTE algorithm outperform the Borderline-SMOTE + IR and ADASYN + IR algorithms because it achieves a high balance between minority and majority classes.","PeriodicalId":7515,"journal":{"name":"Al-Mustansiriyah Journal of Sciences","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2020-04-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Al-Mustansiriyah Journal of Sciences","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.23851/mjs.v31i2.740","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Classification of imbalanced data is an important issue. Many algorithms have been developed for classification, such as Back Propagation (BP) neural networks, decision tree, Bayesian networks etc., and have been used repeatedly in many fields. These algorithms speak of the problem of imbalanced data, where there are situations that belong to more classes than others. Imbalanced data result in poor performance and bias to a class without other classes. In this paper, we proposed three techniques based on the Over-Sampling (O.S.) technique for processing imbalanced dataset and redistributing it and converting it into balanced dataset. These techniques are (Improved Synthetic Minority Over-Sampling Technique (Improved SMOTE),  Borderline-SMOTE + Imbalanced Ratio(IR), Adaptive Synthetic Sampling (ADASYN) +IR) Algorithm, where the work these techniques are generate the synthetic samples for the minority class to achieve balance between minority and majority classes and then calculate the IR between classes of minority and majority. Experimental results show ImprovedSMOTE algorithm outperform the Borderline-SMOTE + IR and ADASYN + IR algorithms because it achieves a high balance between minority and majority classes.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于过采样技术的不平衡数据集处理方法
不平衡数据的分类是一个重要的问题。许多分类算法已经被开发出来,如BP神经网络、决策树、贝叶斯网络等,并在许多领域得到了反复的应用。这些算法谈到了数据不平衡的问题,即存在属于更多类的情况。不平衡的数据导致性能不佳,并且对没有其他类的类有偏见。本文提出了基于过采样(oversampling, O.S.)技术的三种处理不平衡数据集的技术,并将其重新分布并转换为平衡数据集。这些技术是(改进的合成少数过度采样技术(改进SMOTE),边界-SMOTE +不平衡比率(IR),自适应合成采样(ADASYN) +IR)算法,其中这些技术的工作是为少数类生成合成样本,以实现少数和多数类之间的平衡,然后计算少数和多数类之间的IR。实验结果表明,改进的smote算法实现了少数类和多数类之间的高度平衡,优于Borderline-SMOTE + IR和ADASYN + IR算法。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Analysis The Intensity of CO2 Emissions from Fossil Fuel Combustion in Iraq Rainwater Harvesting Using GIS Technique: A Case Study of Diyala Governorate, Iraq Climate index; Cold events; Extreme; Precipitations. Modelling Heat Transfer in Solar Distiller with Additional Condenser Studying Monitoring the Land Covers Around Al- Razaza Lake/ Iraq Based Upon Multi-Temporal Analysis Technique
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1