Hybrid oversampling technique for imbalanced pattern recognition: Enhancing performance with Borderline Synthetic Minority oversampling and Generative Adversarial Networks

IF 4.9 Machine learning with applications Pub Date : 2025-06-01 Epub Date: 2025-03-14 DOI:10.1016/j.mlwa.2025.100637

Md Manjurul Ahsan , Shivakumar Raman , Yingtao Liu , Zahed Siddique

{"title":"Hybrid oversampling technique for imbalanced pattern recognition: Enhancing performance with Borderline Synthetic Minority oversampling and Generative Adversarial Networks","authors":"Md Manjurul Ahsan , Shivakumar Raman , Yingtao Liu , Zahed Siddique","doi":"10.1016/j.mlwa.2025.100637","DOIUrl":null,"url":null,"abstract":"<div><div>Class imbalance problems (CIP) are one of the potential challenges in developing unbiased Machine Learning models for predictions. CIP occurs when data samples are not equally distributed between two or multiple classes. Several synthetic oversampling techniques have been introduced to balance the imbalanced data by oversampling the minor samples. One of the potential drawbacks of existing oversampling techniques is that they often fail to focus on the data samples that lie at the border point and give more attention to the extreme observations, ultimately limiting the creation of more diverse data after oversampling, and that is almost the scenario for most of the oversampling strategies. As an effect, marginalization occurs after oversampling. To address these issues, in this work, we propose a hybrid oversampling technique, named Borderline Synthetic Minority Oversampling and Generative Adversarial Network (BSGAN), by combining the strengths of Borderline-Synthetic Minority Oversampling Technique (SMOTE) and Generative Adversarial Networks (GANs). This approach aims to generate more diverse data that follow Gaussian distributions, marking a significant contribution to the field of Artificial Intelligence. We tested BSGAN on ten highly imbalanced datasets, demonstrating its application in engineering, where it outperformed existing oversampling techniques, creating a more diverse dataset that follows a normal distribution after oversampling.</div></div>","PeriodicalId":74093,"journal":{"name":"Machine learning with applications","volume":"20 ","pages":"Article 100637"},"PeriodicalIF":4.9000,"publicationDate":"2025-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Machine learning with applications","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2666827025000209","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/3/14 0:00:00","PubModel":"Epub","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Class imbalance problems (CIP) are one of the potential challenges in developing unbiased Machine Learning models for predictions. CIP occurs when data samples are not equally distributed between two or multiple classes. Several synthetic oversampling techniques have been introduced to balance the imbalanced data by oversampling the minor samples. One of the potential drawbacks of existing oversampling techniques is that they often fail to focus on the data samples that lie at the border point and give more attention to the extreme observations, ultimately limiting the creation of more diverse data after oversampling, and that is almost the scenario for most of the oversampling strategies. As an effect, marginalization occurs after oversampling. To address these issues, in this work, we propose a hybrid oversampling technique, named Borderline Synthetic Minority Oversampling and Generative Adversarial Network (BSGAN), by combining the strengths of Borderline-Synthetic Minority Oversampling Technique (SMOTE) and Generative Adversarial Networks (GANs). This approach aims to generate more diverse data that follow Gaussian distributions, marking a significant contribution to the field of Artificial Intelligence. We tested BSGAN on ten highly imbalanced datasets, demonstrating its application in engineering, where it outperformed existing oversampling techniques, creating a more diverse dataset that follows a normal distribution after oversampling.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

不平衡模式识别的混合过采样技术：用边界合成少数过采样和生成对抗网络增强性能

类不平衡问题（CIP）是开发无偏机器学习预测模型的潜在挑战之一。当数据样本在两个或多个类别之间不均匀分布时，就会发生CIP。介绍了几种综合过采样技术，通过对小样本进行过采样来平衡不平衡数据。现有过采样技术的一个潜在缺点是，它们往往不能关注位于边界点的数据样本，而更多地关注极端观测值，最终限制了过采样后更多样化数据的创建，这几乎是大多数过采样策略的情况。因此，过采样后会出现边缘化。为了解决这些问题，在这项工作中，我们提出了一种混合过采样技术，称为边界合成少数过采样和生成对抗网络（BSGAN），通过结合边界合成少数过采样技术（SMOTE）和生成对抗网络（GANs）的优势。这种方法旨在生成遵循高斯分布的更多样化的数据，标志着对人工智能领域的重大贡献。我们在10个高度不平衡的数据集上测试了BSGAN，展示了它在工程中的应用，在那里它优于现有的过采样技术，创建了一个更多样化的数据集，在过采样后遵循正态分布。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Machine learning with applications Management Science and Operations Research, Artificial Intelligence, Computer Science Applications

自引率

0.00%

发文量

审稿时长

98 days