排序稀疏性：用于选择和估计特征交互和多项式的令人信服的正则化框架

IF 16.4 1区化学 Q1 CHEMISTRY, MULTIDISCIPLINARY Accounts of Chemical Research Pub Date : 2022-01-25 DOI:10.1007/s10182-021-00431-7

Ryan A. Peterson, Joseph E. Cavanaugh

{"title":"排序稀疏性：用于选择和估计特征交互和多项式的令人信服的正则化框架","authors":"Ryan A. Peterson, Joseph E. Cavanaugh","doi":"10.1007/s10182-021-00431-7","DOIUrl":null,"url":null,"abstract":"<div><p>We explore and illustrate the concept of ranked sparsity, a phenomenon that often occurs naturally in modeling applications when an expected disparity exists in the quality of information between different feature sets. Its presence can cause traditional and modern model selection methods to fail because such procedures commonly presume that each potential parameter is equally worthy of entering into the final model—we call this presumption “covariate equipoise.” However, this presumption does not always hold, especially in the presence of derived variables. For instance, when all possible interactions are considered as candidate predictors, the premise of covariate equipoise will often produce over-specified and opaque models. The sheer number of additional candidate variables grossly inflates the number of false discoveries in the interactions, resulting in unnecessarily complex and difficult-to-interpret models with many (truly spurious) interactions. We suggest a modeling strategy that requires a stronger level of evidence in order to allow certain variables (e.g., interactions) to be selected in the final model. This ranked sparsity paradigm can be implemented with the sparsity-ranked lasso (SRL). We compare the performance of SRL relative to competing methods in a series of simulation studies, showing that the SRL is a very attractive method because it is fast and accurate and produces more transparent models (with fewer false interactions). We illustrate its utility in an application to predict the survival of lung cancer patients using a set of gene expression measurements and clinical covariates, searching in particular for gene–environment interactions.</p></div>","PeriodicalId":1,"journal":{"name":"Accounts of Chemical Research","volume":null,"pages":null},"PeriodicalIF":16.4000,"publicationDate":"2022-01-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10182-021-00431-7.pdf","citationCount":"1","resultStr":"{\"title\":\"Ranked sparsity: a cogent regularization framework for selecting and estimating feature interactions and polynomials\",\"authors\":\"Ryan A. Peterson, Joseph E. Cavanaugh\",\"doi\":\"10.1007/s10182-021-00431-7\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>We explore and illustrate the concept of ranked sparsity, a phenomenon that often occurs naturally in modeling applications when an expected disparity exists in the quality of information between different feature sets. Its presence can cause traditional and modern model selection methods to fail because such procedures commonly presume that each potential parameter is equally worthy of entering into the final model—we call this presumption “covariate equipoise.” However, this presumption does not always hold, especially in the presence of derived variables. For instance, when all possible interactions are considered as candidate predictors, the premise of covariate equipoise will often produce over-specified and opaque models. The sheer number of additional candidate variables grossly inflates the number of false discoveries in the interactions, resulting in unnecessarily complex and difficult-to-interpret models with many (truly spurious) interactions. We suggest a modeling strategy that requires a stronger level of evidence in order to allow certain variables (e.g., interactions) to be selected in the final model. This ranked sparsity paradigm can be implemented with the sparsity-ranked lasso (SRL). We compare the performance of SRL relative to competing methods in a series of simulation studies, showing that the SRL is a very attractive method because it is fast and accurate and produces more transparent models (with fewer false interactions). We illustrate its utility in an application to predict the survival of lung cancer patients using a set of gene expression measurements and clinical covariates, searching in particular for gene–environment interactions.</p></div>\",\"PeriodicalId\":1,\"journal\":{\"name\":\"Accounts of Chemical Research\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":16.4000,\"publicationDate\":\"2022-01-25\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://link.springer.com/content/pdf/10.1007/s10182-021-00431-7.pdf\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Accounts of Chemical Research\",\"FirstCategoryId\":\"100\",\"ListUrlMain\":\"https://link.springer.com/article/10.1007/s10182-021-00431-7\",\"RegionNum\":1,\"RegionCategory\":\"化学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"CHEMISTRY, MULTIDISCIPLINARY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Accounts of Chemical Research","FirstCategoryId":"100","ListUrlMain":"https://link.springer.com/article/10.1007/s10182-021-00431-7","RegionNum":1,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"CHEMISTRY, MULTIDISCIPLINARY","Score":null,"Total":0}

引用次数: 1

摘要

我们探索并说明了排序稀疏性的概念，当不同特征集之间的信息质量存在预期差异时，在建模应用程序中经常自然发生这种现象。它的存在会导致传统和现代模型选择方法的失败，因为这些方法通常假设每个潜在参数都同样值得进入最终模型-我们称之为“协变量均衡”。然而，这种假设并不总是成立，特别是在存在衍生变量的情况下。例如，当所有可能的相互作用被视为候选预测因子时，协变量均衡的前提通常会产生过度指定和不透明的模型。额外候选变量的绝对数量大大增加了相互作用中错误发现的数量，导致具有许多(真正虚假的)相互作用的不必要的复杂和难以解释的模型。我们建议一种建模策略，它需要更强的证据水平，以便允许在最终模型中选择某些变量(例如，相互作用)。这种分级稀疏性范例可以用稀疏度分级套索(SRL)来实现。我们在一系列仿真研究中比较了SRL相对于竞争方法的性能，表明SRL是一种非常有吸引力的方法，因为它快速和准确，并且产生更透明的模型(具有更少的错误交互)。我们说明了它在预测肺癌患者生存的应用中的效用，使用一组基因表达测量和临床协变量，特别是搜索基因与环境的相互作用。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

摘要图片

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Ranked sparsity: a cogent regularization framework for selecting and estimating feature interactions and polynomials

We explore and illustrate the concept of ranked sparsity, a phenomenon that often occurs naturally in modeling applications when an expected disparity exists in the quality of information between different feature sets. Its presence can cause traditional and modern model selection methods to fail because such procedures commonly presume that each potential parameter is equally worthy of entering into the final model—we call this presumption “covariate equipoise.” However, this presumption does not always hold, especially in the presence of derived variables. For instance, when all possible interactions are considered as candidate predictors, the premise of covariate equipoise will often produce over-specified and opaque models. The sheer number of additional candidate variables grossly inflates the number of false discoveries in the interactions, resulting in unnecessarily complex and difficult-to-interpret models with many (truly spurious) interactions. We suggest a modeling strategy that requires a stronger level of evidence in order to allow certain variables (e.g., interactions) to be selected in the final model. This ranked sparsity paradigm can be implemented with the sparsity-ranked lasso (SRL). We compare the performance of SRL relative to competing methods in a series of simulation studies, showing that the SRL is a very attractive method because it is fast and accurate and produces more transparent models (with fewer false interactions). We illustrate its utility in an application to predict the survival of lung cancer patients using a set of gene expression measurements and clinical covariates, searching in particular for gene–environment interactions.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Accounts of Chemical Research 化学-化学综合

CiteScore

31.40

自引率

1.10%

发文量

312

审稿时长

2 months

期刊介绍： Accounts of Chemical Research presents short, concise and critical articles offering easy-to-read overviews of basic research and applications in all areas of chemistry and biochemistry. These short reviews focus on research from the author’s own laboratory and are designed to teach the reader about a research project. In addition, Accounts of Chemical Research publishes commentaries that give an informed opinion on a current research problem. Special Issues online are devoted to a single topic of unusual activity and significance. Accounts of Chemical Research replaces the traditional article abstract with an article "Conspectus." These entries synopsize the research affording the reader a closer look at the content and significance of an article. Through this provision of a more detailed description of the article contents, the Conspectus enhances the article's discoverability by search engines and the exposure for the research.

期刊最新文献

Management of Cholesteatoma: Hearing Rehabilitation. Congenital Cholesteatoma. Evaluation of Cholesteatoma. Management of Cholesteatoma: Extension Beyond Middle Ear/Mastoid. Recidivism and Recurrence.