Weiqing Wang, Yuanting Yan, Peng Zhou, Shu Zhao, Yiwen Zhang
{"title":"Constructive sample partition-based parameter-free sampling for class-overlapped imbalanced data classification","authors":"Weiqing Wang, Yuanting Yan, Peng Zhou, Shu Zhao, Yiwen Zhang","doi":"10.1007/s10489-025-06385-6","DOIUrl":null,"url":null,"abstract":"<div><p>Imbalanced data widely exists in real applications ranging from medical diagnosis to economic fraud detection, etc. Data level method is one of the prevalent methods to deal with imbalanced data by re-balancing the distribution between different classes. Recent researches reveal that handling the class-overlapping of imbalanced data when designing data-level approach can effectively improve the performance of imbalanced learning. However, most existing data-level methods rely on specific parameters to obtain desired performance, making them hard to generalize to other scenarios. And the intractable data difficulty factors, i.e., the most frequent class-overlapping problem, makes them confront additional challenges. Designing efficient, flexible method that considers the parameter-free designing and the class-overlapping handling simultaneously remains a challenge. This paper proposes to deal with the class-overlapped imbalanced data with parameter-free adaptive method. To be specific, we first propose a parameter-free constructive sample partition (CSP) method, and then design an adaptive parameter-free CSP-based undersampling method (CSPUS) and an adaptive parameter-free CSP-based hybrid sampling method (CSPHS) to balance the class distribution by handling the class-overlap of the original data. Numerical experiments on 18 representative high-overlap imbalanced datasets from KEEL repository and 23 state-of-the-art comparison methods demonstrate the effectiveness of CSPUS and CSPHS. The source code of our proposed methods is available at https://github.com/ytyancp/CSPS.</p></div>","PeriodicalId":8041,"journal":{"name":"Applied Intelligence","volume":"55 6","pages":""},"PeriodicalIF":3.4000,"publicationDate":"2025-03-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Applied Intelligence","FirstCategoryId":"94","ListUrlMain":"https://link.springer.com/article/10.1007/s10489-025-06385-6","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Imbalanced data widely exists in real applications ranging from medical diagnosis to economic fraud detection, etc. Data level method is one of the prevalent methods to deal with imbalanced data by re-balancing the distribution between different classes. Recent researches reveal that handling the class-overlapping of imbalanced data when designing data-level approach can effectively improve the performance of imbalanced learning. However, most existing data-level methods rely on specific parameters to obtain desired performance, making them hard to generalize to other scenarios. And the intractable data difficulty factors, i.e., the most frequent class-overlapping problem, makes them confront additional challenges. Designing efficient, flexible method that considers the parameter-free designing and the class-overlapping handling simultaneously remains a challenge. This paper proposes to deal with the class-overlapped imbalanced data with parameter-free adaptive method. To be specific, we first propose a parameter-free constructive sample partition (CSP) method, and then design an adaptive parameter-free CSP-based undersampling method (CSPUS) and an adaptive parameter-free CSP-based hybrid sampling method (CSPHS) to balance the class distribution by handling the class-overlap of the original data. Numerical experiments on 18 representative high-overlap imbalanced datasets from KEEL repository and 23 state-of-the-art comparison methods demonstrate the effectiveness of CSPUS and CSPHS. The source code of our proposed methods is available at https://github.com/ytyancp/CSPS.
期刊介绍:
With a focus on research in artificial intelligence and neural networks, this journal addresses issues involving solutions of real-life manufacturing, defense, management, government and industrial problems which are too complex to be solved through conventional approaches and require the simulation of intelligent thought processes, heuristics, applications of knowledge, and distributed and parallel processing. The integration of these multiple approaches in solving complex problems is of particular importance.
The journal presents new and original research and technological developments, addressing real and complex issues applicable to difficult problems. It provides a medium for exchanging scientific research and technological achievements accomplished by the international community.