A non-parameter oversampling approach for imbalanced data classification based on hybrid natural neighbors

IF 3.5 2区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Applied Intelligence Pub Date : 2025-01-22 DOI:10.1007/s10489-025-06236-4

Junyue Lin, Lu Liang

{"title":"A non-parameter oversampling approach for imbalanced data classification based on hybrid natural neighbors","authors":"Junyue Lin, Lu Liang","doi":"10.1007/s10489-025-06236-4","DOIUrl":null,"url":null,"abstract":"<div><p>In recent years, researchers have developed numerous interpolation-based oversampling techniques to tackle class imbalance in classification tasks. However, most existing techniques encounter the challenge of k parameter due to the involvement of k nearest neighbor (kNN). Furthermore, they only adopt one sole neighborhood rule, disregarding the positional characteristics of minority samples. This often leads to the generation of synthetic noise or overlapping samples. This paper proposes a non-parameter oversampling framework called the hybrid natural neighbor synthetic minority oversampling technique (HNaNSMOTE). HNaNSMOTE effectively determines an appropriate k value through iterative search and adopts a hybrid neighborhood rule for each minority sample to generate more representative and diverse synthetic samples. Specifically, 1) a hybrid natural neighbor search procedure is conducted on the entire dataset to obtain a data-related k value, which eliminates the need for manually preset parameters. Different natural neighbors are formed for each sample to better identify the positional characteristics of minority samples during the procedure. 2) To improve the quality of the generated samples, the hybrid natural neighbor (HNaN) concept has been proposed. HNaN utilizes kNN and reverse kNN to find neighbors adaptively based on the distribution of minority samples. It is beneficial for mitigating the generation of synthetic noise or overlapping samples since it takes into account the existence of majority samples. Experimental results on 32 benchmark binary datasets with three classifiers demonstrate that HNaNSMOTE outperforms numerous state-of-the-art oversampling techniques for imbalanced classification in terms of Sensitivity and G-mean.</p></div>","PeriodicalId":8041,"journal":{"name":"Applied Intelligence","volume":"55 5","pages":""},"PeriodicalIF":3.5000,"publicationDate":"2025-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Applied Intelligence","FirstCategoryId":"94","ListUrlMain":"https://link.springer.com/article/10.1007/s10489-025-06236-4","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

In recent years, researchers have developed numerous interpolation-based oversampling techniques to tackle class imbalance in classification tasks. However, most existing techniques encounter the challenge of k parameter due to the involvement of k nearest neighbor (kNN). Furthermore, they only adopt one sole neighborhood rule, disregarding the positional characteristics of minority samples. This often leads to the generation of synthetic noise or overlapping samples. This paper proposes a non-parameter oversampling framework called the hybrid natural neighbor synthetic minority oversampling technique (HNaNSMOTE). HNaNSMOTE effectively determines an appropriate k value through iterative search and adopts a hybrid neighborhood rule for each minority sample to generate more representative and diverse synthetic samples. Specifically, 1) a hybrid natural neighbor search procedure is conducted on the entire dataset to obtain a data-related k value, which eliminates the need for manually preset parameters. Different natural neighbors are formed for each sample to better identify the positional characteristics of minority samples during the procedure. 2) To improve the quality of the generated samples, the hybrid natural neighbor (HNaN) concept has been proposed. HNaN utilizes kNN and reverse kNN to find neighbors adaptively based on the distribution of minority samples. It is beneficial for mitigating the generation of synthetic noise or overlapping samples since it takes into account the existence of majority samples. Experimental results on 32 benchmark binary datasets with three classifiers demonstrate that HNaNSMOTE outperforms numerous state-of-the-art oversampling techniques for imbalanced classification in terms of Sensitivity and G-mean.

Abstract Image

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

基于混合自然邻域的非参数过采样非平衡数据分类方法

近年来，研究人员开发了许多基于插值的过采样技术来解决分类任务中的类不平衡问题。然而，由于k个最近邻（kNN）的参与，大多数现有技术都遇到了k参数的挑战。此外，它们只采用一个单一的邻域规则，而忽略了少数样本的位置特征。这通常会导致合成噪声或重叠样本的产生。本文提出了一种非参数过采样框架，称为混合自然邻域合成少数过采样技术（HNaNSMOTE）。HNaNSMOTE通过迭代搜索有效确定合适的k值，并对每个少数派样本采用混合邻域规则，生成更具代表性和多样性的合成样本。具体而言，1)对整个数据集进行混合自然邻居搜索，获得与数据相关的k值，消除了手动预设参数的需要。每个样本形成不同的自然邻域，以便更好地识别少数样本的位置特征。2)为了提高生成样本的质量，提出了混合自然邻域（HNaN）概念。HNaN基于少数样本的分布，利用kNN和逆kNN自适应地寻找邻居。该方法考虑了多数样本的存在，有利于减少合成噪声或重叠样本的产生。在32个具有三种分类器的基准二值数据集上的实验结果表明，在灵敏度和g均值方面，HNaNSMOTE优于许多最先进的非平衡分类过采样技术。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Applied Intelligence 工程技术-计算机：人工智能

CiteScore

6.60

自引率

20.80%

发文量

1361

审稿时长

5.9 months

期刊介绍： With a focus on research in artificial intelligence and neural networks, this journal addresses issues involving solutions of real-life manufacturing, defense, management, government and industrial problems which are too complex to be solved through conventional approaches and require the simulation of intelligent thought processes, heuristics, applications of knowledge, and distributed and parallel processing. The integration of these multiple approaches in solving complex problems is of particular importance. The journal presents new and original research and technological developments, addressing real and complex issues applicable to difficult problems. It provides a medium for exchanging scientific research and technological achievements accomplished by the international community.