Imbalcbl: addressing deep learning challenges with small and imbalanced datasets

IF 1.6 Q2 ENGINEERING, MULTIDISCIPLINARY International Journal of System Assurance Engineering and Management Pub Date : 2024-05-01 DOI:10.1007/s13198-024-02346-3
Saqib ul Sabha, Assif Assad, Sadaf Shafi, Nusrat Mohi Ud Din, Rayees Ahmad Dar, Muzafar Rasool Bhat
{"title":"Imbalcbl: addressing deep learning challenges with small and imbalanced datasets","authors":"Saqib ul Sabha, Assif Assad, Sadaf Shafi, Nusrat Mohi Ud Din, Rayees Ahmad Dar, Muzafar Rasool Bhat","doi":"10.1007/s13198-024-02346-3","DOIUrl":null,"url":null,"abstract":"<p>Deep learning, while transformative for computer vision, frequently falters when confronted with small and imbalanced datasets. Despite substantial progress in this domain, prevailing models often underachieve under these constraints. Addressing this, we introduce an innovative contrast-based learning strategy for small and imbalanced data that significantly bolsters the proficiency of deep learning architectures on these challenging datasets. By ingeniously concatenating training images, the effective training dataset expands from <i>n</i> to <span>\\(n^2\\)</span>, affording richer data for model training, even when <i>n</i> is very small. Remarkably, our solution remains indifferent to specific loss functions or network architectures, endorsing its adaptability for diverse classification scenarios. Rigorously benchmarked against four benchmark datasets, our approach was juxtaposed with state-of-the-art oversampling paradigms. The empirical evidence underscores our method’s superior efficacy, outshining contemporaries across metrics like Balanced accuracy, F1 score, and Geometric mean. Noteworthy increments include 7–16% on the Covid-19 dataset, 4–20% for Honey bees, 1–6% on CIFAR-10, and 1–9% on FashionMNIST. In essence, our proposed method offers a potent remedy for the perennial issues stemming from scanty and skewed data in deep learning.</p>","PeriodicalId":14463,"journal":{"name":"International Journal of System Assurance Engineering and Management","volume":null,"pages":null},"PeriodicalIF":1.6000,"publicationDate":"2024-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of System Assurance Engineering and Management","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1007/s13198-024-02346-3","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, MULTIDISCIPLINARY","Score":null,"Total":0}
引用次数: 0

Abstract

Deep learning, while transformative for computer vision, frequently falters when confronted with small and imbalanced datasets. Despite substantial progress in this domain, prevailing models often underachieve under these constraints. Addressing this, we introduce an innovative contrast-based learning strategy for small and imbalanced data that significantly bolsters the proficiency of deep learning architectures on these challenging datasets. By ingeniously concatenating training images, the effective training dataset expands from n to \(n^2\), affording richer data for model training, even when n is very small. Remarkably, our solution remains indifferent to specific loss functions or network architectures, endorsing its adaptability for diverse classification scenarios. Rigorously benchmarked against four benchmark datasets, our approach was juxtaposed with state-of-the-art oversampling paradigms. The empirical evidence underscores our method’s superior efficacy, outshining contemporaries across metrics like Balanced accuracy, F1 score, and Geometric mean. Noteworthy increments include 7–16% on the Covid-19 dataset, 4–20% for Honey bees, 1–6% on CIFAR-10, and 1–9% on FashionMNIST. In essence, our proposed method offers a potent remedy for the perennial issues stemming from scanty and skewed data in deep learning.

Abstract Image

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Imbalcbl:利用小型不平衡数据集应对深度学习挑战
深度学习虽然对计算机视觉具有变革意义,但在面对小型和不平衡数据集时往往会出现问题。尽管在这一领域取得了长足进步,但现有模型在这些限制条件下往往表现不佳。为了解决这个问题,我们针对小数据和不平衡数据引入了一种基于对比度的创新学习策略,大大提高了深度学习架构在这些具有挑战性的数据集上的能力。通过巧妙地连接训练图像,有效的训练数据集从 n 扩展到 \(n^2\),即使 n 非常小,也能为模型训练提供更丰富的数据。值得注意的是,我们的解决方案对特定的损失函数或网络架构无动于衷,这证明了它对不同分类场景的适应性。根据四个基准数据集对我们的方法进行了严格的基准测试,并将其与最先进的超采样范例进行了对比。经验证明,我们的方法具有卓越的功效,在平衡准确率、F1 分数和几何平均数等指标上都优于同时代的方法。值得注意的是,Covid-19 数据集的准确率提高了 7-16%,蜜蜂的准确率提高了 4-20%,CIFAR-10 的准确率提高了 1-6%,FashionMNIST 的准确率提高了 1-9%。从本质上讲,我们提出的方法为深度学习中因数据稀少和偏斜而长期存在的问题提供了有效的解决方案。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
CiteScore
4.30
自引率
10.00%
发文量
252
期刊介绍: This Journal is established with a view to cater to increased awareness for high quality research in the seamless integration of heterogeneous technologies to formulate bankable solutions to the emergent complex engineering problems. Assurance engineering could be thought of as relating to the provision of higher confidence in the reliable and secure implementation of a system’s critical characteristic features through the espousal of a holistic approach by using a wide variety of cross disciplinary tools and techniques. Successful realization of sustainable and dependable products, systems and services involves an extensive adoption of Reliability, Quality, Safety and Risk related procedures for achieving high assurancelevels of performance; also pivotal are the management issues related to risk and uncertainty that govern the practical constraints encountered in their deployment. It is our intention to provide a platform for the modeling and analysis of large engineering systems, among the other aforementioned allied goals of systems assurance engineering, leading to the enforcement of performance enhancement measures. Achieving a fine balance between theory and practice is the primary focus. The Journal only publishes high quality papers that have passed the rigorous peer review procedure of an archival scientific Journal. The aim is an increasing number of submissions, wide circulation and a high impact factor.
期刊最新文献
Vision-based gait analysis to detect Parkinson’s disease using hybrid Harris hawks and Arithmetic optimization algorithm with Random Forest classifier Zero crossing point detection in a distorted sinusoidal signal using random forest classifier FL-XGBTC: federated learning inspired with XG-boost tuned classifier for YouTube spam content detection A generalized product adoption model under random marketing conditions Assessing e-learning platforms in higher education with reference to student satisfaction: a PLS-SEM approach
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1