{"title":"A non-parameter oversampling approach for imbalanced data classification based on hybrid natural neighbors","authors":"Junyue Lin, Lu Liang","doi":"10.1007/s10489-025-06236-4","DOIUrl":null,"url":null,"abstract":"<div><p>In recent years, researchers have developed numerous interpolation-based oversampling techniques to tackle class imbalance in classification tasks. However, most existing techniques encounter the challenge of k parameter due to the involvement of k nearest neighbor (kNN). Furthermore, they only adopt one sole neighborhood rule, disregarding the positional characteristics of minority samples. This often leads to the generation of synthetic noise or overlapping samples. This paper proposes a non-parameter oversampling framework called the hybrid natural neighbor synthetic minority oversampling technique (HNaNSMOTE). HNaNSMOTE effectively determines an appropriate k value through iterative search and adopts a hybrid neighborhood rule for each minority sample to generate more representative and diverse synthetic samples. Specifically, 1) a hybrid natural neighbor search procedure is conducted on the entire dataset to obtain a data-related k value, which eliminates the need for manually preset parameters. Different natural neighbors are formed for each sample to better identify the positional characteristics of minority samples during the procedure. 2) To improve the quality of the generated samples, the hybrid natural neighbor (HNaN) concept has been proposed. HNaN utilizes kNN and reverse kNN to find neighbors adaptively based on the distribution of minority samples. It is beneficial for mitigating the generation of synthetic noise or overlapping samples since it takes into account the existence of majority samples. Experimental results on 32 benchmark binary datasets with three classifiers demonstrate that HNaNSMOTE outperforms numerous state-of-the-art oversampling techniques for imbalanced classification in terms of Sensitivity and G-mean.</p></div>","PeriodicalId":8041,"journal":{"name":"Applied Intelligence","volume":"55 5","pages":""},"PeriodicalIF":3.4000,"publicationDate":"2025-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Applied Intelligence","FirstCategoryId":"94","ListUrlMain":"https://link.springer.com/article/10.1007/s10489-025-06236-4","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
In recent years, researchers have developed numerous interpolation-based oversampling techniques to tackle class imbalance in classification tasks. However, most existing techniques encounter the challenge of k parameter due to the involvement of k nearest neighbor (kNN). Furthermore, they only adopt one sole neighborhood rule, disregarding the positional characteristics of minority samples. This often leads to the generation of synthetic noise or overlapping samples. This paper proposes a non-parameter oversampling framework called the hybrid natural neighbor synthetic minority oversampling technique (HNaNSMOTE). HNaNSMOTE effectively determines an appropriate k value through iterative search and adopts a hybrid neighborhood rule for each minority sample to generate more representative and diverse synthetic samples. Specifically, 1) a hybrid natural neighbor search procedure is conducted on the entire dataset to obtain a data-related k value, which eliminates the need for manually preset parameters. Different natural neighbors are formed for each sample to better identify the positional characteristics of minority samples during the procedure. 2) To improve the quality of the generated samples, the hybrid natural neighbor (HNaN) concept has been proposed. HNaN utilizes kNN and reverse kNN to find neighbors adaptively based on the distribution of minority samples. It is beneficial for mitigating the generation of synthetic noise or overlapping samples since it takes into account the existence of majority samples. Experimental results on 32 benchmark binary datasets with three classifiers demonstrate that HNaNSMOTE outperforms numerous state-of-the-art oversampling techniques for imbalanced classification in terms of Sensitivity and G-mean.
期刊介绍:
With a focus on research in artificial intelligence and neural networks, this journal addresses issues involving solutions of real-life manufacturing, defense, management, government and industrial problems which are too complex to be solved through conventional approaches and require the simulation of intelligent thought processes, heuristics, applications of knowledge, and distributed and parallel processing. The integration of these multiple approaches in solving complex problems is of particular importance.
The journal presents new and original research and technological developments, addressing real and complex issues applicable to difficult problems. It provides a medium for exchanging scientific research and technological achievements accomplished by the international community.