Estimation and Inference for High Dimensional Generalized Linear Models: A Splitting and Smoothing Approach.

IF 4.3 3区计算机科学 Q1 AUTOMATION & CONTROL SYSTEMS Journal of Machine Learning Research Pub Date : 2021-01-01

Zhe Fei, Yi Li

{"title":"Estimation and Inference for High Dimensional Generalized Linear Models: A Splitting and Smoothing Approach.","authors":"Zhe Fei, Yi Li","doi":"","DOIUrl":null,"url":null,"abstract":"<p><p>The focus of modern biomedical studies has gradually shifted to explanation and estimation of joint effects of high dimensional predictors on disease risks. Quantifying uncertainty in these estimates may provide valuable insight into prevention strategies or treatment decisions for both patients and physicians. High dimensional inference, including confidence intervals and hypothesis testing, has sparked much interest. While much work has been done in the linear regression setting, there is lack of literature on inference for high dimensional generalized linear models. We propose a novel and computationally feasible method, which accommodates a variety of outcome types, including normal, binomial, and Poisson data. We use a \"splitting and smoothing\" approach, which splits samples into two parts, performs variable selection using one part and conducts partial regression with the other part. Averaging the estimates over multiple random splits, we obtain the smoothed estimates, which are numerically stable. We show that the estimates are consistent, asymptotically normal, and construct confidence intervals with proper coverage probabilities for all predictors. We examine the finite sample performance of our method by comparing it with the existing methods and applying it to analyze a lung cancer cohort study.</p>","PeriodicalId":50161,"journal":{"name":"Journal of Machine Learning Research","volume":null,"pages":null},"PeriodicalIF":4.3000,"publicationDate":"2021-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8442657/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Machine Learning Research","FirstCategoryId":"94","ListUrlMain":"","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AUTOMATION & CONTROL SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

The focus of modern biomedical studies has gradually shifted to explanation and estimation of joint effects of high dimensional predictors on disease risks. Quantifying uncertainty in these estimates may provide valuable insight into prevention strategies or treatment decisions for both patients and physicians. High dimensional inference, including confidence intervals and hypothesis testing, has sparked much interest. While much work has been done in the linear regression setting, there is lack of literature on inference for high dimensional generalized linear models. We propose a novel and computationally feasible method, which accommodates a variety of outcome types, including normal, binomial, and Poisson data. We use a "splitting and smoothing" approach, which splits samples into two parts, performs variable selection using one part and conducts partial regression with the other part. Averaging the estimates over multiple random splits, we obtain the smoothed estimates, which are numerically stable. We show that the estimates are consistent, asymptotically normal, and construct confidence intervals with proper coverage probabilities for all predictors. We examine the finite sample performance of our method by comparing it with the existing methods and applying it to analyze a lung cancer cohort study.

Abstract Image

微信好友朋友圈 QQ好友复制链接

本刊更多论文

高维广义线性模型的估计与推理:一种分裂与平滑方法。

现代生物医学研究的重点已逐渐转向解释和估计高维预测因子对疾病风险的联合效应。量化这些估计中的不确定性可能为患者和医生提供有价值的预防策略或治疗决策。包括置信区间和假设检验在内的高维推理引起了人们的极大兴趣。虽然在线性回归设置方面已经做了很多工作，但缺乏关于高维广义线性模型推理的文献。我们提出了一种新的和计算上可行的方法，它适用于各种结果类型，包括正态，二项和泊松数据。我们使用“分裂和平滑”的方法，将样本分成两部分，使用一部分进行变量选择，并对另一部分进行部分回归。对多个随机分割的估计进行平均，得到数值稳定的平滑估计。我们证明了估计是一致的，渐近正态的，并为所有预测因子构建了具有适当覆盖概率的置信区间。我们通过将我们的方法与现有方法进行比较，并将其应用于分析肺癌队列研究，来检验我们方法的有限样本性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Journal of Machine Learning Research 工程技术-计算机：人工智能

CiteScore

18.80

自引率

0.00%

发文量

审稿时长

3 months

期刊介绍： The Journal of Machine Learning Research (JMLR) provides an international forum for the electronic and paper publication of high-quality scholarly articles in all areas of machine learning. All published papers are freely available online. JMLR has a commitment to rigorous yet rapid reviewing. JMLR seeks previously unpublished papers on machine learning that contain: new principled algorithms with sound empirical validation, and with justification of theoretical, psychological, or biological nature; experimental and/or theoretical studies yielding new insight into the design and behavior of learning in intelligent systems; accounts of applications of existing techniques that shed light on the strengths and weaknesses of the methods; formalization of new learning tasks (e.g., in the context of new applications) and of methods for assessing performance on those tasks; development of new analytical frameworks that advance theoretical studies of practical learning methods; computational models of data from natural learning systems at the behavioral or neural level; or extremely well-written surveys of existing work.