A non-parameter oversampling approach for imbalanced data classification based on hybrid natural neighbors

IF 3.4 2区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Applied Intelligence Pub Date : 2025-01-22 DOI:10.1007/s10489-025-06236-4
Junyue Lin, Lu Liang
{"title":"A non-parameter oversampling approach for imbalanced data classification based on hybrid natural neighbors","authors":"Junyue Lin,&nbsp;Lu Liang","doi":"10.1007/s10489-025-06236-4","DOIUrl":null,"url":null,"abstract":"<div><p>In recent years, researchers have developed numerous interpolation-based oversampling techniques to tackle class imbalance in classification tasks. However, most existing techniques encounter the challenge of k parameter due to the involvement of k nearest neighbor (kNN). Furthermore, they only adopt one sole neighborhood rule, disregarding the positional characteristics of minority samples. This often leads to the generation of synthetic noise or overlapping samples. This paper proposes a non-parameter oversampling framework called the hybrid natural neighbor synthetic minority oversampling technique (HNaNSMOTE). HNaNSMOTE effectively determines an appropriate k value through iterative search and adopts a hybrid neighborhood rule for each minority sample to generate more representative and diverse synthetic samples. Specifically, 1) a hybrid natural neighbor search procedure is conducted on the entire dataset to obtain a data-related k value, which eliminates the need for manually preset parameters. Different natural neighbors are formed for each sample to better identify the positional characteristics of minority samples during the procedure. 2) To improve the quality of the generated samples, the hybrid natural neighbor (HNaN) concept has been proposed. HNaN utilizes kNN and reverse kNN to find neighbors adaptively based on the distribution of minority samples. It is beneficial for mitigating the generation of synthetic noise or overlapping samples since it takes into account the existence of majority samples. Experimental results on 32 benchmark binary datasets with three classifiers demonstrate that HNaNSMOTE outperforms numerous state-of-the-art oversampling techniques for imbalanced classification in terms of Sensitivity and G-mean.</p></div>","PeriodicalId":8041,"journal":{"name":"Applied Intelligence","volume":"55 5","pages":""},"PeriodicalIF":3.4000,"publicationDate":"2025-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Applied Intelligence","FirstCategoryId":"94","ListUrlMain":"https://link.springer.com/article/10.1007/s10489-025-06236-4","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

Abstract

In recent years, researchers have developed numerous interpolation-based oversampling techniques to tackle class imbalance in classification tasks. However, most existing techniques encounter the challenge of k parameter due to the involvement of k nearest neighbor (kNN). Furthermore, they only adopt one sole neighborhood rule, disregarding the positional characteristics of minority samples. This often leads to the generation of synthetic noise or overlapping samples. This paper proposes a non-parameter oversampling framework called the hybrid natural neighbor synthetic minority oversampling technique (HNaNSMOTE). HNaNSMOTE effectively determines an appropriate k value through iterative search and adopts a hybrid neighborhood rule for each minority sample to generate more representative and diverse synthetic samples. Specifically, 1) a hybrid natural neighbor search procedure is conducted on the entire dataset to obtain a data-related k value, which eliminates the need for manually preset parameters. Different natural neighbors are formed for each sample to better identify the positional characteristics of minority samples during the procedure. 2) To improve the quality of the generated samples, the hybrid natural neighbor (HNaN) concept has been proposed. HNaN utilizes kNN and reverse kNN to find neighbors adaptively based on the distribution of minority samples. It is beneficial for mitigating the generation of synthetic noise or overlapping samples since it takes into account the existence of majority samples. Experimental results on 32 benchmark binary datasets with three classifiers demonstrate that HNaNSMOTE outperforms numerous state-of-the-art oversampling techniques for imbalanced classification in terms of Sensitivity and G-mean.

Abstract Image

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于混合自然邻域的非参数过采样非平衡数据分类方法
近年来,研究人员开发了许多基于插值的过采样技术来解决分类任务中的类不平衡问题。然而,由于k个最近邻(kNN)的参与,大多数现有技术都遇到了k参数的挑战。此外,它们只采用一个单一的邻域规则,而忽略了少数样本的位置特征。这通常会导致合成噪声或重叠样本的产生。本文提出了一种非参数过采样框架,称为混合自然邻域合成少数过采样技术(HNaNSMOTE)。HNaNSMOTE通过迭代搜索有效确定合适的k值,并对每个少数派样本采用混合邻域规则,生成更具代表性和多样性的合成样本。具体而言,1)对整个数据集进行混合自然邻居搜索,获得与数据相关的k值,消除了手动预设参数的需要。每个样本形成不同的自然邻域,以便更好地识别少数样本的位置特征。2)为了提高生成样本的质量,提出了混合自然邻域(HNaN)概念。HNaN基于少数样本的分布,利用kNN和逆kNN自适应地寻找邻居。该方法考虑了多数样本的存在,有利于减少合成噪声或重叠样本的产生。在32个具有三种分类器的基准二值数据集上的实验结果表明,在灵敏度和g均值方面,HNaNSMOTE优于许多最先进的非平衡分类过采样技术。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Applied Intelligence
Applied Intelligence 工程技术-计算机:人工智能
CiteScore
6.60
自引率
20.80%
发文量
1361
审稿时长
5.9 months
期刊介绍: With a focus on research in artificial intelligence and neural networks, this journal addresses issues involving solutions of real-life manufacturing, defense, management, government and industrial problems which are too complex to be solved through conventional approaches and require the simulation of intelligent thought processes, heuristics, applications of knowledge, and distributed and parallel processing. The integration of these multiple approaches in solving complex problems is of particular importance. The journal presents new and original research and technological developments, addressing real and complex issues applicable to difficult problems. It provides a medium for exchanging scientific research and technological achievements accomplished by the international community.
期刊最新文献
Insulator defect detection from aerial images in adverse weather conditions A review of the emotion recognition model of robots Knowledge guided relation enhancement for human-object interaction detection A modified dueling DQN algorithm for robot path planning incorporating priority experience replay and artificial potential fields A non-parameter oversampling approach for imbalanced data classification based on hybrid natural neighbors
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1