Simulation-calibration testing for inference in Lasso regressions

arXiv - STAT - Statistics Theory Pub Date : 2024-09-03 DOI:arxiv-2409.02269

Matthieu Pluntz, Cyril Dalmasso, Pascale Tubert-Bitter, Ismail Ahmed

{"title":"Simulation-calibration testing for inference in Lasso regressions","authors":"Matthieu Pluntz, Cyril Dalmasso, Pascale Tubert-Bitter, Ismail Ahmed","doi":"arxiv-2409.02269","DOIUrl":null,"url":null,"abstract":"We propose a test of the significance of a variable appearing on the Lasso\npath and use it in a procedure for selecting one of the models of the Lasso\npath, controlling the Family-Wise Error Rate. Our null hypothesis depends on a\nset A of already selected variables and states that it contains all the active\nvariables. We focus on the regularization parameter value from which a first\nvariable outside A is selected. As the test statistic, we use this quantity's\nconditional p-value, which we define conditional on the non-penalized estimated\ncoefficients of the model restricted to A. We estimate this by simulating\noutcome vectors and then calibrating them on the observed outcome's estimated\ncoefficients. We adapt the calibration heuristically to the case of generalized\nlinear models in which it turns into an iterative stochastic procedure. We\nprove that the test controls the risk of selecting a false positive in linear\nmodels, both under the null hypothesis and, under a correlation condition, when\nA does not contain all active variables. We assess the performance of our\nprocedure through extensive simulation studies. We also illustrate it in the\ndetection of exposures associated with drug-induced liver injuries in the\nFrench pharmacovigilance database.","PeriodicalId":501379,"journal":{"name":"arXiv - STAT - Statistics Theory","volume":"61 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - STAT - Statistics Theory","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.02269","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

We propose a test of the significance of a variable appearing on the Lasso path and use it in a procedure for selecting one of the models of the Lasso path, controlling the Family-Wise Error Rate. Our null hypothesis depends on a set A of already selected variables and states that it contains all the active variables. We focus on the regularization parameter value from which a first variable outside A is selected. As the test statistic, we use this quantity's conditional p-value, which we define conditional on the non-penalized estimated coefficients of the model restricted to A. We estimate this by simulating outcome vectors and then calibrating them on the observed outcome's estimated coefficients. We adapt the calibration heuristically to the case of generalized linear models in which it turns into an iterative stochastic procedure. We prove that the test controls the risk of selecting a false positive in linear models, both under the null hypothesis and, under a correlation condition, when A does not contain all active variables. We assess the performance of our procedure through extensive simulation studies. We also illustrate it in the detection of exposures associated with drug-induced liver injuries in the French pharmacovigilance database.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Lasso 回归推理的模拟校准测试

我们建议对拉索帕斯中出现的变量进行重要性检验，并将其用于选择拉索帕斯模型之一的程序中，同时控制全族平均误差率（Family-Wise Error Rate）。我们的零假设取决于已选定变量的集合 A，并指出它包含所有有效变量。我们的重点是正则化参数值，从中选出 A 以外的第一个变量。作为检验统计量，我们使用这个量的条件 p 值，它是以限制在 A 中的模型的非惩罚估计系数为条件定义的。我们通过模拟结果向量，然后根据观察结果的估计系数进行校准来估计这个值。我们将校准启发式地应用于广义线性模型的情况，在这种情况下，校准变成了一个迭代随机过程。我们证明，无论是在零假设下，还是在相关条件下，当 A 不包含所有活动变量时，该检验都能控制线性模型中选择假阳性的风险。我们通过大量的模拟研究评估了我们程序的性能。我们还以法国药物警戒数据库中与药物引起的肝损伤相关的暴露检测为例进行了说明。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

arXiv - STAT - Statistics Theory

自引率

0.00%

发文量

期刊最新文献

Cyclicity Analysis of the Ornstein-Uhlenbeck Process Linear hypothesis testing in high-dimensional heteroscedastics via random integration Asymptotics for conformal inference Sparse Factor Analysis for Categorical Data with the Group-Sparse Generalized Singular Value Decomposition Incremental effects for continuous exposures