基于Bagging集成学习模型的irs - bag集成半径- smote算法用于不平衡数据集分类

Q1 Multidisciplinary Emerging Science Journal Pub Date : 2023-10-01 DOI:10.28991/esj-2023-07-05-04

Lilis Yuningsih, Gede Angga Pradipta, Dadang Hermawan, Putu Desiana Wulaning Ayu, Dandy Pramana Hostiadi, Roy Rudolf Huizen

{"title":"基于Bagging集成学习模型的irs - bag集成半径- smote算法用于不平衡数据集分类","authors":"Lilis Yuningsih, Gede Angga Pradipta, Dadang Hermawan, Putu Desiana Wulaning Ayu, Dandy Pramana Hostiadi, Roy Rudolf Huizen","doi":"10.28991/esj-2023-07-05-04","DOIUrl":null,"url":null,"abstract":"Imbalanced learning problems are a challenge faced by classifiers when data samples have an unbalanced distribution among classes. The Synthetic Minority Over-Sampling Technique (SMOTE) is one of the most well-known data pre-processing methods. Problems that arise when oversampling with SMOTE are the phenomenon of noise, small disjunct samples, and overfitting due to a high imbalance ratio in a dataset. A high level of imbalance ratio and low variance conditions cause the results of synthetic data generation to be collected in narrow areas and conflicting regions among classes and make them susceptible to overfitting during the learning process by machine learning methods. Therefore, this research proposes a combination between Radius-SMOTE and Bagging Algorithm called the IRS-BAG Model. For each sub-sample generated by bootstrapping, oversampling was done using Radius SMOTE. Oversampling on the sub-sample was likely to overcome overfitting problems that might occur. Experiments were carried out by comparing the performance of the IRS-BAG model with various previous oversampling methods using the imbalanced public dataset. The experiment results using three different classifiers proved that all classifiers had gained a notable improvement when combined with the proposed IRS-BAG model compared with the previous state-of-the-art oversampling methods. Doi: 10.28991/ESJ-2023-07-05-04 Full Text: PDF","PeriodicalId":11586,"journal":{"name":"Emerging Science Journal","volume":"27 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"IRS-BAG-Integrated Radius-SMOTE Algorithm with Bagging Ensemble Learning Model for Imbalanced Data Set Classification\",\"authors\":\"Lilis Yuningsih, Gede Angga Pradipta, Dadang Hermawan, Putu Desiana Wulaning Ayu, Dandy Pramana Hostiadi, Roy Rudolf Huizen\",\"doi\":\"10.28991/esj-2023-07-05-04\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Imbalanced learning problems are a challenge faced by classifiers when data samples have an unbalanced distribution among classes. The Synthetic Minority Over-Sampling Technique (SMOTE) is one of the most well-known data pre-processing methods. Problems that arise when oversampling with SMOTE are the phenomenon of noise, small disjunct samples, and overfitting due to a high imbalance ratio in a dataset. A high level of imbalance ratio and low variance conditions cause the results of synthetic data generation to be collected in narrow areas and conflicting regions among classes and make them susceptible to overfitting during the learning process by machine learning methods. Therefore, this research proposes a combination between Radius-SMOTE and Bagging Algorithm called the IRS-BAG Model. For each sub-sample generated by bootstrapping, oversampling was done using Radius SMOTE. Oversampling on the sub-sample was likely to overcome overfitting problems that might occur. Experiments were carried out by comparing the performance of the IRS-BAG model with various previous oversampling methods using the imbalanced public dataset. The experiment results using three different classifiers proved that all classifiers had gained a notable improvement when combined with the proposed IRS-BAG model compared with the previous state-of-the-art oversampling methods. Doi: 10.28991/ESJ-2023-07-05-04 Full Text: PDF\",\"PeriodicalId\":11586,\"journal\":{\"name\":\"Emerging Science Journal\",\"volume\":\"27 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Emerging Science Journal\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.28991/esj-2023-07-05-04\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"Multidisciplinary\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Emerging Science Journal","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.28991/esj-2023-07-05-04","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"Multidisciplinary","Score":null,"Total":0}

引用次数: 0

摘要

不平衡学习问题是分类器在类间数据样本分布不平衡时所面临的挑战。合成少数派过采样技术(SMOTE)是最著名的数据预处理方法之一。使用SMOTE进行过采样时出现的问题是噪声现象，小的分离样本以及由于数据集中的高不平衡比率而导致的过拟合现象。高失衡比和低方差条件导致合成数据生成的结果收集在类之间的狭窄区域和冲突区域，在机器学习方法的学习过程中容易出现过拟合。因此，本研究提出了Radius-SMOTE与Bagging算法的结合，称为IRS-BAG模型。对于自举生成的每个子样本，使用Radius SMOTE进行过采样。子样本上的过采样可能会克服可能发生的过拟合问题。利用不平衡公共数据集，将IRS-BAG模型与以往各种过采样方法的性能进行了比较。使用三种不同分类器的实验结果证明，与之前最先进的过采样方法相比，所有分类器与所提出的IRS-BAG模型相结合都获得了显着的改进。Doi: 10.28991/ESJ-2023-07-05-04全文:PDF

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

IRS-BAG-Integrated Radius-SMOTE Algorithm with Bagging Ensemble Learning Model for Imbalanced Data Set Classification

Imbalanced learning problems are a challenge faced by classifiers when data samples have an unbalanced distribution among classes. The Synthetic Minority Over-Sampling Technique (SMOTE) is one of the most well-known data pre-processing methods. Problems that arise when oversampling with SMOTE are the phenomenon of noise, small disjunct samples, and overfitting due to a high imbalance ratio in a dataset. A high level of imbalance ratio and low variance conditions cause the results of synthetic data generation to be collected in narrow areas and conflicting regions among classes and make them susceptible to overfitting during the learning process by machine learning methods. Therefore, this research proposes a combination between Radius-SMOTE and Bagging Algorithm called the IRS-BAG Model. For each sub-sample generated by bootstrapping, oversampling was done using Radius SMOTE. Oversampling on the sub-sample was likely to overcome overfitting problems that might occur. Experiments were carried out by comparing the performance of the IRS-BAG model with various previous oversampling methods using the imbalanced public dataset. The experiment results using three different classifiers proved that all classifiers had gained a notable improvement when combined with the proposed IRS-BAG model compared with the previous state-of-the-art oversampling methods. Doi: 10.28991/ESJ-2023-07-05-04 Full Text: PDF

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊