{"title":"用于高维分类的随机投影集合共形预测","authors":"Xiaoyu Qian , Jinru Wu , Ligong Wei , Youwu Lin","doi":"10.1016/j.chemolab.2024.105225","DOIUrl":null,"url":null,"abstract":"<div><p>In classification problems, many models with superior performance fail to provide confidence estimates or intervals for each prediction. This lack of reliability poses risks in real-world applications, making these models difficult to trust. Conformal prediction, as distribution-free and model-free approaches with finite-sample coverage guarantee, have recently been widely used to construct prediction sets for classification models. However, traditional conformal prediction methods only produce set-valued results without specifying a definitive predicted class. Particularly in complex settings, these methods fail to assist models in effectively addressing challenges such as high dimensionality, resulting in ambiguous prediction sets with low statistical efficiency, i.e. the prediction sets contain many false classes. In this study, a novel Ensemble Conformal Prediction algorithm based on Random Projection and a designed voting strategy, RPECP, is developed to tackle these challenges. Initially, a procedure for selecting the approximately oracle random projections and classifiers is executed to best leverage the internal information and structure of the data. Subsequently, based on the approximately oracle random projections and underlying classifiers, conformal prediction is performed on new test samples in a lower-dimensional space, resulting in multiple independent prediction sets. Finally, an accurate predicted class and a precise prediction set with high coverage and statistical efficiency are produced through a designed voting strategy. Compared to several base classifiers, RPECP obtain higher classification accuracy; against other conformal prediction algorithms, it achieves less ambiguous prediction sets with fewer false classes while guaranteeing high coverage. For illustration, this paper demonstrates RPECP's superiority over other methods in four cases: two high-dimensional settings and two real-world datasets.</p></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"253 ","pages":"Article 105225"},"PeriodicalIF":3.7000,"publicationDate":"2024-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Random projection ensemble conformal prediction for high-dimensional classification\",\"authors\":\"Xiaoyu Qian , Jinru Wu , Ligong Wei , Youwu Lin\",\"doi\":\"10.1016/j.chemolab.2024.105225\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>In classification problems, many models with superior performance fail to provide confidence estimates or intervals for each prediction. This lack of reliability poses risks in real-world applications, making these models difficult to trust. Conformal prediction, as distribution-free and model-free approaches with finite-sample coverage guarantee, have recently been widely used to construct prediction sets for classification models. However, traditional conformal prediction methods only produce set-valued results without specifying a definitive predicted class. Particularly in complex settings, these methods fail to assist models in effectively addressing challenges such as high dimensionality, resulting in ambiguous prediction sets with low statistical efficiency, i.e. the prediction sets contain many false classes. In this study, a novel Ensemble Conformal Prediction algorithm based on Random Projection and a designed voting strategy, RPECP, is developed to tackle these challenges. Initially, a procedure for selecting the approximately oracle random projections and classifiers is executed to best leverage the internal information and structure of the data. Subsequently, based on the approximately oracle random projections and underlying classifiers, conformal prediction is performed on new test samples in a lower-dimensional space, resulting in multiple independent prediction sets. Finally, an accurate predicted class and a precise prediction set with high coverage and statistical efficiency are produced through a designed voting strategy. Compared to several base classifiers, RPECP obtain higher classification accuracy; against other conformal prediction algorithms, it achieves less ambiguous prediction sets with fewer false classes while guaranteeing high coverage. For illustration, this paper demonstrates RPECP's superiority over other methods in four cases: two high-dimensional settings and two real-world datasets.</p></div>\",\"PeriodicalId\":9774,\"journal\":{\"name\":\"Chemometrics and Intelligent Laboratory Systems\",\"volume\":\"253 \",\"pages\":\"Article 105225\"},\"PeriodicalIF\":3.7000,\"publicationDate\":\"2024-09-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Chemometrics and Intelligent Laboratory Systems\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0169743924001655\",\"RegionNum\":2,\"RegionCategory\":\"化学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"AUTOMATION & CONTROL SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Chemometrics and Intelligent Laboratory Systems","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0169743924001655","RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"AUTOMATION & CONTROL SYSTEMS","Score":null,"Total":0}
Random projection ensemble conformal prediction for high-dimensional classification
In classification problems, many models with superior performance fail to provide confidence estimates or intervals for each prediction. This lack of reliability poses risks in real-world applications, making these models difficult to trust. Conformal prediction, as distribution-free and model-free approaches with finite-sample coverage guarantee, have recently been widely used to construct prediction sets for classification models. However, traditional conformal prediction methods only produce set-valued results without specifying a definitive predicted class. Particularly in complex settings, these methods fail to assist models in effectively addressing challenges such as high dimensionality, resulting in ambiguous prediction sets with low statistical efficiency, i.e. the prediction sets contain many false classes. In this study, a novel Ensemble Conformal Prediction algorithm based on Random Projection and a designed voting strategy, RPECP, is developed to tackle these challenges. Initially, a procedure for selecting the approximately oracle random projections and classifiers is executed to best leverage the internal information and structure of the data. Subsequently, based on the approximately oracle random projections and underlying classifiers, conformal prediction is performed on new test samples in a lower-dimensional space, resulting in multiple independent prediction sets. Finally, an accurate predicted class and a precise prediction set with high coverage and statistical efficiency are produced through a designed voting strategy. Compared to several base classifiers, RPECP obtain higher classification accuracy; against other conformal prediction algorithms, it achieves less ambiguous prediction sets with fewer false classes while guaranteeing high coverage. For illustration, this paper demonstrates RPECP's superiority over other methods in four cases: two high-dimensional settings and two real-world datasets.
期刊介绍:
Chemometrics and Intelligent Laboratory Systems publishes original research papers, short communications, reviews, tutorials and Original Software Publications reporting on development of novel statistical, mathematical, or computer techniques in Chemistry and related disciplines.
Chemometrics is the chemical discipline that uses mathematical and statistical methods to design or select optimal procedures and experiments, and to provide maximum chemical information by analysing chemical data.
The journal deals with the following topics:
1) Development of new statistical, mathematical and chemometrical methods for Chemistry and related fields (Environmental Chemistry, Biochemistry, Toxicology, System Biology, -Omics, etc.)
2) Novel applications of chemometrics to all branches of Chemistry and related fields (typical domains of interest are: process data analysis, experimental design, data mining, signal processing, supervised modelling, decision making, robust statistics, mixture analysis, multivariate calibration etc.) Routine applications of established chemometrical techniques will not be considered.
3) Development of new software that provides novel tools or truly advances the use of chemometrical methods.
4) Well characterized data sets to test performance for the new methods and software.
The journal complies with International Committee of Medical Journal Editors'' Uniform requirements for manuscripts.