稀疏高阶交互模型置信机

IF 0.8 4区数学 Q3 STATISTICS & PROBABILITY Stat Pub Date : 2024-02-05 DOI:10.1002/sta4.633

Diptesh Das, Eugene Ndiaye, Ichiro Takeuchi

{"title":"稀疏高阶交互模型置信机","authors":"Diptesh Das, Eugene Ndiaye, Ichiro Takeuchi","doi":"10.1002/sta4.633","DOIUrl":null,"url":null,"abstract":"In predictive modelling for high-stake decision-making, predictors must be not only accurate but also reliable. Conformal prediction (CP) is a promising approach for obtaining the coverage of prediction results with fewer theoretical assumptions. To obtain the prediction set by so-called full-CP, we need to refit the predictor for all possible values of prediction results, which is only possible for simple predictors. For complex predictors such as random forests (RFs) or neural networks (NNs), split-CP is often employed where the data is split into two parts: one part for fitting and another for computing the prediction set. Unfortunately, because of the reduced sample size, split-CP is inferior to full-CP both in fitting as well as prediction set computation. In this paper, we develop a full-CP of sparse high-order interaction model (SHIM), which is sufficiently flexible as it can take into account high-order interactions among variables. We resolve the computational challenge for full-CP of SHIM by introducing a novel approach called homotopy mining. Through numerical experiments, we demonstrate that SHIM is as accurate as complex predictors such as RF and NN and enjoys the superior statistical power of full-CP.","PeriodicalId":56159,"journal":{"name":"Stat","volume":"35 1","pages":""},"PeriodicalIF":0.8000,"publicationDate":"2024-02-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A confidence machine for sparse high-order interaction model\",\"authors\":\"Diptesh Das, Eugene Ndiaye, Ichiro Takeuchi\",\"doi\":\"10.1002/sta4.633\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In predictive modelling for high-stake decision-making, predictors must be not only accurate but also reliable. Conformal prediction (CP) is a promising approach for obtaining the coverage of prediction results with fewer theoretical assumptions. To obtain the prediction set by so-called full-CP, we need to refit the predictor for all possible values of prediction results, which is only possible for simple predictors. For complex predictors such as random forests (RFs) or neural networks (NNs), split-CP is often employed where the data is split into two parts: one part for fitting and another for computing the prediction set. Unfortunately, because of the reduced sample size, split-CP is inferior to full-CP both in fitting as well as prediction set computation. In this paper, we develop a full-CP of sparse high-order interaction model (SHIM), which is sufficiently flexible as it can take into account high-order interactions among variables. We resolve the computational challenge for full-CP of SHIM by introducing a novel approach called homotopy mining. Through numerical experiments, we demonstrate that SHIM is as accurate as complex predictors such as RF and NN and enjoys the superior statistical power of full-CP.\",\"PeriodicalId\":56159,\"journal\":{\"name\":\"Stat\",\"volume\":\"35 1\",\"pages\":\"\"},\"PeriodicalIF\":0.8000,\"publicationDate\":\"2024-02-05\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Stat\",\"FirstCategoryId\":\"100\",\"ListUrlMain\":\"https://doi.org/10.1002/sta4.633\",\"RegionNum\":4,\"RegionCategory\":\"数学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"STATISTICS & PROBABILITY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Stat","FirstCategoryId":"100","ListUrlMain":"https://doi.org/10.1002/sta4.633","RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"STATISTICS & PROBABILITY","Score":null,"Total":0}

引用次数: 0

摘要

在用于高风险决策的预测建模中，预测器不仅要准确，而且要可靠。共形预测（CP）是一种以较少理论假设获得预测结果覆盖面的有前途的方法。要通过所谓的全共形预测（full-CP）获得预测集，我们需要针对预测结果的所有可能值重新拟合预测器，而这只适用于简单的预测器。对于随机森林（RF）或神经网络（NN）等复杂预测器，通常采用拆分式 CP，即将数据拆分为两部分：一部分用于拟合，另一部分用于计算预测集。遗憾的是，由于样本量减少，拆分式 CP 在拟合和预测集计算方面都不如完全式 CP。在本文中，我们开发了一种稀疏高阶交互模型（SHIM）的全CP，它具有足够的灵活性，可以考虑变量间的高阶交互作用。我们通过引入一种名为同调挖掘（homotopy mining）的新方法，解决了 SHIM 全 CP 的计算难题。通过数值实验，我们证明了 SHIM 与 RF 和 NN 等复杂预测器一样准确，并具有全 CP 的卓越统计能力。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

A confidence machine for sparse high-order interaction model

In predictive modelling for high-stake decision-making, predictors must be not only accurate but also reliable. Conformal prediction (CP) is a promising approach for obtaining the coverage of prediction results with fewer theoretical assumptions. To obtain the prediction set by so-called full-CP, we need to refit the predictor for all possible values of prediction results, which is only possible for simple predictors. For complex predictors such as random forests (RFs) or neural networks (NNs), split-CP is often employed where the data is split into two parts: one part for fitting and another for computing the prediction set. Unfortunately, because of the reduced sample size, split-CP is inferior to full-CP both in fitting as well as prediction set computation. In this paper, we develop a full-CP of sparse high-order interaction model (SHIM), which is sufficiently flexible as it can take into account high-order interactions among variables. We resolve the computational challenge for full-CP of SHIM by introducing a novel approach called homotopy mining. Through numerical experiments, we demonstrate that SHIM is as accurate as complex predictors such as RF and NN and enjoys the superior statistical power of full-CP.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Stat Decision Sciences-Statistics, Probability and Uncertainty

CiteScore

1.10

自引率

0.00%

发文量

期刊介绍： Stat is an innovative electronic journal for the rapid publication of novel and topical research results, publishing compact articles of the highest quality in all areas of statistical endeavour. Its purpose is to provide a means of rapid sharing of important new theoretical, methodological and applied research. Stat is a joint venture between the International Statistical Institute and Wiley-Blackwell. Stat is characterised by: • Speed - a high-quality review process that aims to reach a decision within 20 days of submission. • Concision - a maximum article length of 10 pages of text, not including references. • Supporting materials - inclusion of electronic supporting materials including graphs, video, software, data and images. • Scope - addresses all areas of statistics and interdisciplinary areas. Stat is a scientific journal for the international community of statisticians and researchers and practitioners in allied quantitative disciplines.