不平衡回归中罕见病例预测的自适应平方误差相关性

IF 2 4区化学 Q1 SOCIAL WORK Journal of Chemometrics Pub Date : 2023-09-08 DOI:10.1002/cem.3515

Ying Kou, Guang-Hui Fu

{"title":"不平衡回归中罕见病例预测的自适应平方误差相关性","authors":"Ying Kou, Guang-Hui Fu","doi":"10.1002/cem.3515","DOIUrl":null,"url":null,"abstract":"<p>Many real-world data mining applications involve using imbalanced datasets to obtain predictive models. Imbalanced data can hinder the model performance of learning algorithms in rare cases. Although there are many well-researched classification task solutions, most of them cannot be directly applied to regression task. One of the challenges in imbalanced regression is to find a suitable evaluation and optimization standard that can improve the predictive ability of the model without severe model bias. Based on the importance of rare cases, this study proposes a new evaluation metric called adapted squared error relevance (ASER) by defining new relevance function and weighting functions. This metric weights data points by defining the importance of rare cases and assigns different weights to losses of the same size at different rare cases, thus enabling the model selected by this evaluation metric to better predict rare cases. ASER is compared with SER on 32 real datasets and 9 simulated datasets to verify the predictive performance of the selected model at rare cases. The experimental results show that the new evaluation metric ASER can obtain a high prediction performance at rare cases, while also not losing too much prediction accuracy in common cases.</p>","PeriodicalId":15274,"journal":{"name":"Journal of Chemometrics","volume":"37 11","pages":""},"PeriodicalIF":2.0000,"publicationDate":"2023-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"ASER: Adapted squared error relevance for rare cases prediction in imbalanced regression\",\"authors\":\"Ying Kou, Guang-Hui Fu\",\"doi\":\"10.1002/cem.3515\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>Many real-world data mining applications involve using imbalanced datasets to obtain predictive models. Imbalanced data can hinder the model performance of learning algorithms in rare cases. Although there are many well-researched classification task solutions, most of them cannot be directly applied to regression task. One of the challenges in imbalanced regression is to find a suitable evaluation and optimization standard that can improve the predictive ability of the model without severe model bias. Based on the importance of rare cases, this study proposes a new evaluation metric called adapted squared error relevance (ASER) by defining new relevance function and weighting functions. This metric weights data points by defining the importance of rare cases and assigns different weights to losses of the same size at different rare cases, thus enabling the model selected by this evaluation metric to better predict rare cases. ASER is compared with SER on 32 real datasets and 9 simulated datasets to verify the predictive performance of the selected model at rare cases. The experimental results show that the new evaluation metric ASER can obtain a high prediction performance at rare cases, while also not losing too much prediction accuracy in common cases.</p>\",\"PeriodicalId\":15274,\"journal\":{\"name\":\"Journal of Chemometrics\",\"volume\":\"37 11\",\"pages\":\"\"},\"PeriodicalIF\":2.0000,\"publicationDate\":\"2023-09-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Chemometrics\",\"FirstCategoryId\":\"92\",\"ListUrlMain\":\"https://onlinelibrary.wiley.com/doi/10.1002/cem.3515\",\"RegionNum\":4,\"RegionCategory\":\"化学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"SOCIAL WORK\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Chemometrics","FirstCategoryId":"92","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/cem.3515","RegionNum":4,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"SOCIAL WORK","Score":null,"Total":0}

引用次数: 0

摘要

许多现实世界的数据挖掘应用涉及使用不平衡数据集来获得预测模型。在极少数情况下，不平衡的数据会阻碍学习算法的模型性能。虽然分类任务的解决方案研究得很好，但大多数都不能直接应用于回归任务。不平衡回归的挑战之一是找到一个合适的评价和优化标准，既能提高模型的预测能力，又不会造成严重的模型偏差。基于罕见案例的重要性，本研究通过定义新的关联函数和加权函数，提出了一种新的评价指标——自适应平方误差相关性(ASER)。该度量通过定义罕见情况的重要性来对数据点进行加权，并对不同罕见情况下相同大小的损失分配不同的权重，从而使该评价度量所选择的模型能够更好地预测罕见情况。将ASER与SER在32个真实数据集和9个模拟数据集上进行了比较，以验证所选模型在极少数情况下的预测性能。实验结果表明，新的评估指标ASER在极少数情况下可以获得较高的预测性能，同时在常见情况下也不会损失太多的预测精度。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

ASER: Adapted squared error relevance for rare cases prediction in imbalanced regression

Many real-world data mining applications involve using imbalanced datasets to obtain predictive models. Imbalanced data can hinder the model performance of learning algorithms in rare cases. Although there are many well-researched classification task solutions, most of them cannot be directly applied to regression task. One of the challenges in imbalanced regression is to find a suitable evaluation and optimization standard that can improve the predictive ability of the model without severe model bias. Based on the importance of rare cases, this study proposes a new evaluation metric called adapted squared error relevance (ASER) by defining new relevance function and weighting functions. This metric weights data points by defining the importance of rare cases and assigns different weights to losses of the same size at different rare cases, thus enabling the model selected by this evaluation metric to better predict rare cases. ASER is compared with SER on 32 real datasets and 9 simulated datasets to verify the predictive performance of the selected model at rare cases. The experimental results show that the new evaluation metric ASER can obtain a high prediction performance at rare cases, while also not losing too much prediction accuracy in common cases.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Journal of Chemometrics 化学-分析化学

CiteScore

5.20

自引率

8.30%

发文量

审稿时长

2 months

期刊介绍： The Journal of Chemometrics is devoted to the rapid publication of original scientific papers, reviews and short communications on fundamental and applied aspects of chemometrics. It also provides a forum for the exchange of information on meetings and other news relevant to the growing community of scientists who are interested in chemometrics and its applications. Short, critical review papers are a particularly important feature of the journal, in view of the multidisciplinary readership at which it is aimed.