Uncertainty quantification driven machine learning for improving model accuracy in imbalanced regression tasks

IF 7.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Expert Systems with Applications Pub Date : 2024-10-15 DOI:10.1016/j.eswa.2024.125526
{"title":"Uncertainty quantification driven machine learning for improving model accuracy in imbalanced regression tasks","authors":"","doi":"10.1016/j.eswa.2024.125526","DOIUrl":null,"url":null,"abstract":"<div><div>Several factors are known to determine the quality of machine learning models, one of which is the dataset quality. One problem related to the quality of a dataset is the imbalance issue. An imbalanced dataset contains significantly more data points for certain values of the output variable which increases the overfitting risk and negatively affects the prediction accuracy. In this article, we propose using epistemic uncertainty quantification (UQ) of machine learning models to identify rare samples in imbalanced regression problems for balancing the dataset. The developed algorithm, uncertainty quantification-driven imbalanced regression (UQDIR), is guided by UQ to restructure the training set with an adequate weight function using existent samples, eliminating the need for new data collection. After identifying rare samples with UQ, the algorithm selects a sample from the training set, assigns a resampling weight using the new weight function, and finally resamples the selected sample according to its assigned weight. We test UQDIR on several benchmark datasets and different machine learning algorithms, then compare its performance with similar imbalanced regression methods. A metamaterial design problem application is also provided for demonstrating the effectiveness of the algorithm in real-world scenarios. We show that improving the quality of UQ metrics results in improved model accuracy.</div></div>","PeriodicalId":50461,"journal":{"name":"Expert Systems with Applications","volume":null,"pages":null},"PeriodicalIF":7.5000,"publicationDate":"2024-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Expert Systems with Applications","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0957417424023935","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

Abstract

Several factors are known to determine the quality of machine learning models, one of which is the dataset quality. One problem related to the quality of a dataset is the imbalance issue. An imbalanced dataset contains significantly more data points for certain values of the output variable which increases the overfitting risk and negatively affects the prediction accuracy. In this article, we propose using epistemic uncertainty quantification (UQ) of machine learning models to identify rare samples in imbalanced regression problems for balancing the dataset. The developed algorithm, uncertainty quantification-driven imbalanced regression (UQDIR), is guided by UQ to restructure the training set with an adequate weight function using existent samples, eliminating the need for new data collection. After identifying rare samples with UQ, the algorithm selects a sample from the training set, assigns a resampling weight using the new weight function, and finally resamples the selected sample according to its assigned weight. We test UQDIR on several benchmark datasets and different machine learning algorithms, then compare its performance with similar imbalanced regression methods. A metamaterial design problem application is also provided for demonstrating the effectiveness of the algorithm in real-world scenarios. We show that improving the quality of UQ metrics results in improved model accuracy.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
不确定性量化驱动机器学习,提高不平衡回归任务中的模型准确性
众所周知,有几个因素可以决定机器学习模型的质量,其中之一就是数据集的质量。与数据集质量有关的一个问题是不平衡问题。不平衡数据集包含的输出变量某些值的数据点明显较多,这会增加过拟合风险,并对预测准确性产生负面影响。在本文中,我们建议使用机器学习模型的认识不确定性量化(UQ)来识别不平衡回归问题中的稀有样本,以平衡数据集。所开发的算法--不确定性量化驱动的不平衡回归(UQDIR)--在 UQ 的指导下,利用现有样本以适当的权重函数重组训练集,从而无需收集新数据。利用 UQ 识别稀有样本后,算法从训练集中选择一个样本,使用新的权重函数分配重新采样权重,最后根据分配的权重对所选样本进行重新采样。我们在多个基准数据集和不同的机器学习算法上测试了 UQDIR,然后将其性能与类似的不平衡回归方法进行了比较。我们还提供了一个超材料设计问题应用,以展示该算法在现实世界中的有效性。我们发现,提高 UQ 指标的质量可以提高模型的准确性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Expert Systems with Applications
Expert Systems with Applications 工程技术-工程:电子与电气
CiteScore
13.80
自引率
10.60%
发文量
2045
审稿时长
8.7 months
期刊介绍: Expert Systems With Applications is an international journal dedicated to the exchange of information on expert and intelligent systems used globally in industry, government, and universities. The journal emphasizes original papers covering the design, development, testing, implementation, and management of these systems, offering practical guidelines. It spans various sectors such as finance, engineering, marketing, law, project management, information management, medicine, and more. The journal also welcomes papers on multi-agent systems, knowledge management, neural networks, knowledge discovery, data mining, and other related areas, excluding applications to military/defense systems.
期刊最新文献
Uncertainty quantification driven machine learning for improving model accuracy in imbalanced regression tasks An efficient data fusion model based on Bayesian model averaging for robust water quality prediction using deep learning strategies Multi-region hierarchical surrogate-assisted quantum-behaved particle swarm optimization for expensive optimization problems Bivariate BMM-based hybrid domain image watermark detector Integrated sentiment analysis with BERT for enhanced hybrid recommendation systems
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1