Hybrid oversampling technique for imbalanced pattern recognition: Enhancing performance with Borderline Synthetic Minority oversampling and Generative Adversarial Networks

IF 4.9 Machine learning with applications Pub Date : 2025-06-01 Epub Date: 2025-03-14 DOI:10.1016/j.mlwa.2025.100637
Md Manjurul Ahsan , Shivakumar Raman , Yingtao Liu , Zahed Siddique
{"title":"Hybrid oversampling technique for imbalanced pattern recognition: Enhancing performance with Borderline Synthetic Minority oversampling and Generative Adversarial Networks","authors":"Md Manjurul Ahsan ,&nbsp;Shivakumar Raman ,&nbsp;Yingtao Liu ,&nbsp;Zahed Siddique","doi":"10.1016/j.mlwa.2025.100637","DOIUrl":null,"url":null,"abstract":"<div><div>Class imbalance problems (CIP) are one of the potential challenges in developing unbiased Machine Learning models for predictions. CIP occurs when data samples are not equally distributed between two or multiple classes. Several synthetic oversampling techniques have been introduced to balance the imbalanced data by oversampling the minor samples. One of the potential drawbacks of existing oversampling techniques is that they often fail to focus on the data samples that lie at the border point and give more attention to the extreme observations, ultimately limiting the creation of more diverse data after oversampling, and that is almost the scenario for most of the oversampling strategies. As an effect, marginalization occurs after oversampling. To address these issues, in this work, we propose a hybrid oversampling technique, named Borderline Synthetic Minority Oversampling and Generative Adversarial Network (BSGAN), by combining the strengths of Borderline-Synthetic Minority Oversampling Technique (SMOTE) and Generative Adversarial Networks (GANs). This approach aims to generate more diverse data that follow Gaussian distributions, marking a significant contribution to the field of Artificial Intelligence. We tested BSGAN on ten highly imbalanced datasets, demonstrating its application in engineering, where it outperformed existing oversampling techniques, creating a more diverse dataset that follows a normal distribution after oversampling.</div></div>","PeriodicalId":74093,"journal":{"name":"Machine learning with applications","volume":"20 ","pages":"Article 100637"},"PeriodicalIF":4.9000,"publicationDate":"2025-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Machine learning with applications","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2666827025000209","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/3/14 0:00:00","PubModel":"Epub","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Class imbalance problems (CIP) are one of the potential challenges in developing unbiased Machine Learning models for predictions. CIP occurs when data samples are not equally distributed between two or multiple classes. Several synthetic oversampling techniques have been introduced to balance the imbalanced data by oversampling the minor samples. One of the potential drawbacks of existing oversampling techniques is that they often fail to focus on the data samples that lie at the border point and give more attention to the extreme observations, ultimately limiting the creation of more diverse data after oversampling, and that is almost the scenario for most of the oversampling strategies. As an effect, marginalization occurs after oversampling. To address these issues, in this work, we propose a hybrid oversampling technique, named Borderline Synthetic Minority Oversampling and Generative Adversarial Network (BSGAN), by combining the strengths of Borderline-Synthetic Minority Oversampling Technique (SMOTE) and Generative Adversarial Networks (GANs). This approach aims to generate more diverse data that follow Gaussian distributions, marking a significant contribution to the field of Artificial Intelligence. We tested BSGAN on ten highly imbalanced datasets, demonstrating its application in engineering, where it outperformed existing oversampling techniques, creating a more diverse dataset that follows a normal distribution after oversampling.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
不平衡模式识别的混合过采样技术:用边界合成少数过采样和生成对抗网络增强性能
类不平衡问题(CIP)是开发无偏机器学习预测模型的潜在挑战之一。当数据样本在两个或多个类别之间不均匀分布时,就会发生CIP。介绍了几种综合过采样技术,通过对小样本进行过采样来平衡不平衡数据。现有过采样技术的一个潜在缺点是,它们往往不能关注位于边界点的数据样本,而更多地关注极端观测值,最终限制了过采样后更多样化数据的创建,这几乎是大多数过采样策略的情况。因此,过采样后会出现边缘化。为了解决这些问题,在这项工作中,我们提出了一种混合过采样技术,称为边界合成少数过采样和生成对抗网络(BSGAN),通过结合边界合成少数过采样技术(SMOTE)和生成对抗网络(GANs)的优势。这种方法旨在生成遵循高斯分布的更多样化的数据,标志着对人工智能领域的重大贡献。我们在10个高度不平衡的数据集上测试了BSGAN,展示了它在工程中的应用,在那里它优于现有的过采样技术,创建了一个更多样化的数据集,在过采样后遵循正态分布。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Machine learning with applications
Machine learning with applications Management Science and Operations Research, Artificial Intelligence, Computer Science Applications
自引率
0.00%
发文量
0
审稿时长
98 days
期刊最新文献
Quantum-inspired bi-level neuro-swarm optimization for UAV-based disaster recognition and response An unsupervised pipeline for class-agnostic object detection using self-supervised vision transformers and Kolmogorov–Arnold Networks Group-based learning on label-free phase-contrast images across dose and exposure time improves bioactive compound classification A deep reinforcement learning approach for emotion recognition from unaligned multimodal inputs Optimizing investment horizons: Machine learning applications in technical analysis of the WIG20 index
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1