{"title":"不平衡回归中罕见病例预测的自适应平方误差相关性","authors":"Ying Kou, Guang-Hui Fu","doi":"10.1002/cem.3515","DOIUrl":null,"url":null,"abstract":"<p>Many real-world data mining applications involve using imbalanced datasets to obtain predictive models. Imbalanced data can hinder the model performance of learning algorithms in rare cases. Although there are many well-researched classification task solutions, most of them cannot be directly applied to regression task. One of the challenges in imbalanced regression is to find a suitable evaluation and optimization standard that can improve the predictive ability of the model without severe model bias. Based on the importance of rare cases, this study proposes a new evaluation metric called adapted squared error relevance (ASER) by defining new relevance function and weighting functions. This metric weights data points by defining the importance of rare cases and assigns different weights to losses of the same size at different rare cases, thus enabling the model selected by this evaluation metric to better predict rare cases. ASER is compared with SER on 32 real datasets and 9 simulated datasets to verify the predictive performance of the selected model at rare cases. The experimental results show that the new evaluation metric ASER can obtain a high prediction performance at rare cases, while also not losing too much prediction accuracy in common cases.</p>","PeriodicalId":15274,"journal":{"name":"Journal of Chemometrics","volume":null,"pages":null},"PeriodicalIF":2.3000,"publicationDate":"2023-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"ASER: Adapted squared error relevance for rare cases prediction in imbalanced regression\",\"authors\":\"Ying Kou, Guang-Hui Fu\",\"doi\":\"10.1002/cem.3515\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>Many real-world data mining applications involve using imbalanced datasets to obtain predictive models. Imbalanced data can hinder the model performance of learning algorithms in rare cases. Although there are many well-researched classification task solutions, most of them cannot be directly applied to regression task. One of the challenges in imbalanced regression is to find a suitable evaluation and optimization standard that can improve the predictive ability of the model without severe model bias. Based on the importance of rare cases, this study proposes a new evaluation metric called adapted squared error relevance (ASER) by defining new relevance function and weighting functions. This metric weights data points by defining the importance of rare cases and assigns different weights to losses of the same size at different rare cases, thus enabling the model selected by this evaluation metric to better predict rare cases. ASER is compared with SER on 32 real datasets and 9 simulated datasets to verify the predictive performance of the selected model at rare cases. The experimental results show that the new evaluation metric ASER can obtain a high prediction performance at rare cases, while also not losing too much prediction accuracy in common cases.</p>\",\"PeriodicalId\":15274,\"journal\":{\"name\":\"Journal of Chemometrics\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":2.3000,\"publicationDate\":\"2023-09-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Chemometrics\",\"FirstCategoryId\":\"92\",\"ListUrlMain\":\"https://onlinelibrary.wiley.com/doi/10.1002/cem.3515\",\"RegionNum\":4,\"RegionCategory\":\"化学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"SOCIAL WORK\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Chemometrics","FirstCategoryId":"92","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/cem.3515","RegionNum":4,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"SOCIAL WORK","Score":null,"Total":0}
ASER: Adapted squared error relevance for rare cases prediction in imbalanced regression
Many real-world data mining applications involve using imbalanced datasets to obtain predictive models. Imbalanced data can hinder the model performance of learning algorithms in rare cases. Although there are many well-researched classification task solutions, most of them cannot be directly applied to regression task. One of the challenges in imbalanced regression is to find a suitable evaluation and optimization standard that can improve the predictive ability of the model without severe model bias. Based on the importance of rare cases, this study proposes a new evaluation metric called adapted squared error relevance (ASER) by defining new relevance function and weighting functions. This metric weights data points by defining the importance of rare cases and assigns different weights to losses of the same size at different rare cases, thus enabling the model selected by this evaluation metric to better predict rare cases. ASER is compared with SER on 32 real datasets and 9 simulated datasets to verify the predictive performance of the selected model at rare cases. The experimental results show that the new evaluation metric ASER can obtain a high prediction performance at rare cases, while also not losing too much prediction accuracy in common cases.
期刊介绍:
The Journal of Chemometrics is devoted to the rapid publication of original scientific papers, reviews and short communications on fundamental and applied aspects of chemometrics. It also provides a forum for the exchange of information on meetings and other news relevant to the growing community of scientists who are interested in chemometrics and its applications. Short, critical review papers are a particularly important feature of the journal, in view of the multidisciplinary readership at which it is aimed.