基于Bagging集成学习模型的irs - bag集成半径- smote算法用于不平衡数据集分类

Q1 Multidisciplinary Emerging Science Journal Pub Date : 2023-10-01 DOI:10.28991/esj-2023-07-05-04
Lilis Yuningsih, Gede Angga Pradipta, Dadang Hermawan, Putu Desiana Wulaning Ayu, Dandy Pramana Hostiadi, Roy Rudolf Huizen
{"title":"基于Bagging集成学习模型的irs - bag集成半径- smote算法用于不平衡数据集分类","authors":"Lilis Yuningsih, Gede Angga Pradipta, Dadang Hermawan, Putu Desiana Wulaning Ayu, Dandy Pramana Hostiadi, Roy Rudolf Huizen","doi":"10.28991/esj-2023-07-05-04","DOIUrl":null,"url":null,"abstract":"Imbalanced learning problems are a challenge faced by classifiers when data samples have an unbalanced distribution among classes. The Synthetic Minority Over-Sampling Technique (SMOTE) is one of the most well-known data pre-processing methods. Problems that arise when oversampling with SMOTE are the phenomenon of noise, small disjunct samples, and overfitting due to a high imbalance ratio in a dataset. A high level of imbalance ratio and low variance conditions cause the results of synthetic data generation to be collected in narrow areas and conflicting regions among classes and make them susceptible to overfitting during the learning process by machine learning methods. Therefore, this research proposes a combination between Radius-SMOTE and Bagging Algorithm called the IRS-BAG Model. For each sub-sample generated by bootstrapping, oversampling was done using Radius SMOTE. Oversampling on the sub-sample was likely to overcome overfitting problems that might occur. Experiments were carried out by comparing the performance of the IRS-BAG model with various previous oversampling methods using the imbalanced public dataset. The experiment results using three different classifiers proved that all classifiers had gained a notable improvement when combined with the proposed IRS-BAG model compared with the previous state-of-the-art oversampling methods. Doi: 10.28991/ESJ-2023-07-05-04 Full Text: PDF","PeriodicalId":11586,"journal":{"name":"Emerging Science Journal","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2023-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"IRS-BAG-Integrated Radius-SMOTE Algorithm with Bagging Ensemble Learning Model for Imbalanced Data Set Classification\",\"authors\":\"Lilis Yuningsih, Gede Angga Pradipta, Dadang Hermawan, Putu Desiana Wulaning Ayu, Dandy Pramana Hostiadi, Roy Rudolf Huizen\",\"doi\":\"10.28991/esj-2023-07-05-04\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Imbalanced learning problems are a challenge faced by classifiers when data samples have an unbalanced distribution among classes. The Synthetic Minority Over-Sampling Technique (SMOTE) is one of the most well-known data pre-processing methods. Problems that arise when oversampling with SMOTE are the phenomenon of noise, small disjunct samples, and overfitting due to a high imbalance ratio in a dataset. A high level of imbalance ratio and low variance conditions cause the results of synthetic data generation to be collected in narrow areas and conflicting regions among classes and make them susceptible to overfitting during the learning process by machine learning methods. Therefore, this research proposes a combination between Radius-SMOTE and Bagging Algorithm called the IRS-BAG Model. For each sub-sample generated by bootstrapping, oversampling was done using Radius SMOTE. Oversampling on the sub-sample was likely to overcome overfitting problems that might occur. Experiments were carried out by comparing the performance of the IRS-BAG model with various previous oversampling methods using the imbalanced public dataset. The experiment results using three different classifiers proved that all classifiers had gained a notable improvement when combined with the proposed IRS-BAG model compared with the previous state-of-the-art oversampling methods. Doi: 10.28991/ESJ-2023-07-05-04 Full Text: PDF\",\"PeriodicalId\":11586,\"journal\":{\"name\":\"Emerging Science Journal\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Emerging Science Journal\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.28991/esj-2023-07-05-04\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"Multidisciplinary\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Emerging Science Journal","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.28991/esj-2023-07-05-04","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"Multidisciplinary","Score":null,"Total":0}
引用次数: 0

摘要

不平衡学习问题是分类器在类间数据样本分布不平衡时所面临的挑战。合成少数派过采样技术(SMOTE)是最著名的数据预处理方法之一。使用SMOTE进行过采样时出现的问题是噪声现象,小的分离样本以及由于数据集中的高不平衡比率而导致的过拟合现象。高失衡比和低方差条件导致合成数据生成的结果收集在类之间的狭窄区域和冲突区域,在机器学习方法的学习过程中容易出现过拟合。因此,本研究提出了Radius-SMOTE与Bagging算法的结合,称为IRS-BAG模型。对于自举生成的每个子样本,使用Radius SMOTE进行过采样。子样本上的过采样可能会克服可能发生的过拟合问题。利用不平衡公共数据集,将IRS-BAG模型与以往各种过采样方法的性能进行了比较。使用三种不同分类器的实验结果证明,与之前最先进的过采样方法相比,所有分类器与所提出的IRS-BAG模型相结合都获得了显着的改进。Doi: 10.28991/ESJ-2023-07-05-04全文:PDF
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
IRS-BAG-Integrated Radius-SMOTE Algorithm with Bagging Ensemble Learning Model for Imbalanced Data Set Classification
Imbalanced learning problems are a challenge faced by classifiers when data samples have an unbalanced distribution among classes. The Synthetic Minority Over-Sampling Technique (SMOTE) is one of the most well-known data pre-processing methods. Problems that arise when oversampling with SMOTE are the phenomenon of noise, small disjunct samples, and overfitting due to a high imbalance ratio in a dataset. A high level of imbalance ratio and low variance conditions cause the results of synthetic data generation to be collected in narrow areas and conflicting regions among classes and make them susceptible to overfitting during the learning process by machine learning methods. Therefore, this research proposes a combination between Radius-SMOTE and Bagging Algorithm called the IRS-BAG Model. For each sub-sample generated by bootstrapping, oversampling was done using Radius SMOTE. Oversampling on the sub-sample was likely to overcome overfitting problems that might occur. Experiments were carried out by comparing the performance of the IRS-BAG model with various previous oversampling methods using the imbalanced public dataset. The experiment results using three different classifiers proved that all classifiers had gained a notable improvement when combined with the proposed IRS-BAG model compared with the previous state-of-the-art oversampling methods. Doi: 10.28991/ESJ-2023-07-05-04 Full Text: PDF
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Emerging Science Journal
Emerging Science Journal Multidisciplinary-Multidisciplinary
CiteScore
5.40
自引率
0.00%
发文量
155
审稿时长
10 weeks
期刊最新文献
Beyond COVID-19 Lockdowns: Rethinking Mathematics Education from a Student Perspective Down-streaming Small-Scale Green Ammonia to Nitrogen-Phosphorus Fertilizer Tablets for Rural Communities Improved Fingerprint-Based Localization Based on Sequential Hybridization of Clustering Algorithms Prioritizing Critical Success Factors for Reverse Logistics as a Source of Competitive Advantage Assessment of the Development of the Circular Economy in the EU Countries: Comparative Analysis by Multiple Criteria Methods
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1