A review on over-sampling techniques in classification of multi-class imbalanced datasets: insights for medical problems

Yuxuan Yang, H. Khorshidi, U. Aickelin
{"title":"A review on over-sampling techniques in classification of multi-class imbalanced datasets: insights for medical problems","authors":"Yuxuan Yang, H. Khorshidi, U. Aickelin","doi":"10.3389/fdgth.2024.1430245","DOIUrl":null,"url":null,"abstract":"There has been growing attention to multi-class classification problems, particularly those challenges of imbalanced class distributions. To address these challenges, various strategies, including data-level re-sampling treatment and ensemble methods, have been introduced to bolster the performance of predictive models and Artificial Intelligence (AI) algorithms in scenarios where excessive level of imbalance is present. While most research and algorithm development have been focused on binary classification problems, in health informatics there is an increased interest in the field to address the problem of multi-class classification in imbalanced datasets. Multi-class imbalance problems bring forth more complex challenges, as a delicate approach is required to generate synthetic data and simultaneously maintain the relationship between the multiple classes. The aim of this review paper is to examine over-sampling methods tailored for medical and other datasets with multi-class imbalance. Out of 2,076 peer-reviewed papers identified through searches, 197 eligible papers were chosen and thoroughly reviewed for inclusion, narrowing to 37 studies being selected for in-depth analysis. These studies are categorised into four categories: metric, adaptive, structure-based, and hybrid approaches. The most significant finding is the emerging trend toward hybrid resampling methods that combine the strengths of various techniques to effectively address the problem of imbalanced data. This paper provides an extensive analysis of each selected study, discusses their findings, and outlines directions for future research.","PeriodicalId":504480,"journal":{"name":"Frontiers in Digital Health","volume":"63 8","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-07-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Frontiers in Digital Health","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3389/fdgth.2024.1430245","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

There has been growing attention to multi-class classification problems, particularly those challenges of imbalanced class distributions. To address these challenges, various strategies, including data-level re-sampling treatment and ensemble methods, have been introduced to bolster the performance of predictive models and Artificial Intelligence (AI) algorithms in scenarios where excessive level of imbalance is present. While most research and algorithm development have been focused on binary classification problems, in health informatics there is an increased interest in the field to address the problem of multi-class classification in imbalanced datasets. Multi-class imbalance problems bring forth more complex challenges, as a delicate approach is required to generate synthetic data and simultaneously maintain the relationship between the multiple classes. The aim of this review paper is to examine over-sampling methods tailored for medical and other datasets with multi-class imbalance. Out of 2,076 peer-reviewed papers identified through searches, 197 eligible papers were chosen and thoroughly reviewed for inclusion, narrowing to 37 studies being selected for in-depth analysis. These studies are categorised into four categories: metric, adaptive, structure-based, and hybrid approaches. The most significant finding is the emerging trend toward hybrid resampling methods that combine the strengths of various techniques to effectively address the problem of imbalanced data. This paper provides an extensive analysis of each selected study, discusses their findings, and outlines directions for future research.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
多类不平衡数据集分类中的过度采样技术综述:对医疗问题的启示
人们越来越关注多类分类问题,尤其是那些类分布不平衡的挑战。为了应对这些挑战,人们引入了各种策略,包括数据级重新采样处理和集合方法,以提高预测模型和人工智能(AI)算法在不平衡程度过高的情况下的性能。虽然大多数研究和算法开发都集中在二元分类问题上,但在健康信息学领域,人们对解决不平衡数据集中的多类分类问题越来越感兴趣。多类不平衡问题带来了更复杂的挑战,因为需要一种精细的方法来生成合成数据,并同时保持多类之间的关系。本综述论文旨在研究针对医疗和其他多类不平衡数据集量身定制的过度采样方法。在通过搜索确定的 2,076 篇同行评审论文中,我们选择了 197 篇符合条件的论文,并对其进行了全面审查,最终选择了 37 篇研究论文进行深入分析。这些研究分为四类:度量方法、自适应方法、基于结构的方法和混合方法。最重要的发现是混合重采样方法的新兴趋势,这种方法结合了各种技术的优势,能有效解决不平衡数据的问题。本文对每项选定的研究进行了广泛分析,讨论了它们的发现,并概述了未来的研究方向。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
On the design and development of a handheld electrocardiogram device in a clinical setting Analyzing the barriers and enablers to internet hospital implementation: a qualitative study of a tertiary hospital using TDF and COM-B framework Smartwatch step counting: impact on daily step-count estimation accuracy A review on over-sampling techniques in classification of multi-class imbalanced datasets: insights for medical problems The need for a refined classification system and national incident reporting system for health information technology-related incidents
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1