{"title":"类不平衡问题:基于包围器的方法,利用采样不足与集合学习","authors":"Riyaz Sikora, Yoon Sang Lee","doi":"10.1007/s10796-024-10533-7","DOIUrl":null,"url":null,"abstract":"<p>Imbalanced data sets are a growing problem in data mining and business analytics. However, the ability of machine learning algorithms to predict the minority class deteriorates in the presence of class imbalance. Although there have been many approaches that have been studied in literature to tackle the imbalance problem, most of these approaches have been met with limited success. In this study, we propose three methods based on a wrapper approach that combine the use of under-sampling with ensemble learning to improve the performance of standard data mining algorithms. We test our ensemble methods on 10 data sets collected from the UCI repository with an imbalance ratio of at least 70%. We compare their performance with two other traditional techniques for dealing with the imbalance problem and show significant improvement in the recall, AUROC, and the average of precision and recall.</p>","PeriodicalId":13610,"journal":{"name":"Information Systems Frontiers","volume":"1 1","pages":""},"PeriodicalIF":6.9000,"publicationDate":"2024-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Class Imbalance Problem: A Wrapper-Based Approach using Under-Sampling with Ensemble Learning\",\"authors\":\"Riyaz Sikora, Yoon Sang Lee\",\"doi\":\"10.1007/s10796-024-10533-7\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>Imbalanced data sets are a growing problem in data mining and business analytics. However, the ability of machine learning algorithms to predict the minority class deteriorates in the presence of class imbalance. Although there have been many approaches that have been studied in literature to tackle the imbalance problem, most of these approaches have been met with limited success. In this study, we propose three methods based on a wrapper approach that combine the use of under-sampling with ensemble learning to improve the performance of standard data mining algorithms. We test our ensemble methods on 10 data sets collected from the UCI repository with an imbalance ratio of at least 70%. We compare their performance with two other traditional techniques for dealing with the imbalance problem and show significant improvement in the recall, AUROC, and the average of precision and recall.</p>\",\"PeriodicalId\":13610,\"journal\":{\"name\":\"Information Systems Frontiers\",\"volume\":\"1 1\",\"pages\":\"\"},\"PeriodicalIF\":6.9000,\"publicationDate\":\"2024-08-29\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Information Systems Frontiers\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.1007/s10796-024-10533-7\",\"RegionNum\":3,\"RegionCategory\":\"管理学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information Systems Frontiers","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s10796-024-10533-7","RegionNum":3,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
Class Imbalance Problem: A Wrapper-Based Approach using Under-Sampling with Ensemble Learning
Imbalanced data sets are a growing problem in data mining and business analytics. However, the ability of machine learning algorithms to predict the minority class deteriorates in the presence of class imbalance. Although there have been many approaches that have been studied in literature to tackle the imbalance problem, most of these approaches have been met with limited success. In this study, we propose three methods based on a wrapper approach that combine the use of under-sampling with ensemble learning to improve the performance of standard data mining algorithms. We test our ensemble methods on 10 data sets collected from the UCI repository with an imbalance ratio of at least 70%. We compare their performance with two other traditional techniques for dealing with the imbalance problem and show significant improvement in the recall, AUROC, and the average of precision and recall.
期刊介绍:
The interdisciplinary interfaces of Information Systems (IS) are fast emerging as defining areas of research and development in IS. These developments are largely due to the transformation of Information Technology (IT) towards networked worlds and its effects on global communications and economies. While these developments are shaping the way information is used in all forms of human enterprise, they are also setting the tone and pace of information systems of the future. The major advances in IT such as client/server systems, the Internet and the desktop/multimedia computing revolution, for example, have led to numerous important vistas of research and development with considerable practical impact and academic significance. While the industry seeks to develop high performance IS/IT solutions to a variety of contemporary information support needs, academia looks to extend the reach of IS technology into new application domains. Information Systems Frontiers (ISF) aims to provide a common forum of dissemination of frontline industrial developments of substantial academic value and pioneering academic research of significant practical impact.