转换稀疏性中的错误发现率控制:拆分仿冒

IF 3.1 1区数学 Q1 STATISTICS & PROBABILITY Journal of the Royal Statistical Society Series B-Statistical Methodology Pub Date : 2023-11-14 DOI:10.1093/jrsssb/qkad126

Yang Cao, Xinwei Sun, Yuan Yao

{"title":"转换稀疏性中的错误发现率控制:拆分仿冒","authors":"Yang Cao, Xinwei Sun, Yuan Yao","doi":"10.1093/jrsssb/qkad126","DOIUrl":null,"url":null,"abstract":"Abstract Controlling the False Discovery Rate (FDR) in a variable selection procedure is critical for reproducible discoveries, and it has been extensively studied in sparse linear models. However, it remains largely open in scenarios where the sparsity constraint is not directly imposed on the parameters but on a linear transformation of the parameters to be estimated. Examples of such scenarios include total variations, wavelet transforms, fused LASSO, and trend filtering. In this paper, we propose a data-adaptive FDR control method, called the Split Knockoff method, for this transformational sparsity setting. The proposed method exploits both variable and data splitting. The linear transformation constraint is relaxed to its Euclidean proximity in a lifted parameter space, which yields an orthogonal design that enables the orthogonal Split Knockoff construction. To overcome the challenge that exchangeability fails due to the heterogeneous noise brought by the transformation, new inverse supermartingale structures are developed via data splitting for provable FDR control without sacrificing power. Simulation experiments demonstrate that the proposed methodology achieves the desired FDR and power. We also provide an application to Alzheimer’s Disease study, where atrophy brain regions and their abnormal connections can be discovered based on a structural Magnetic Resonance Imaging dataset.","PeriodicalId":49982,"journal":{"name":"Journal of the Royal Statistical Society Series B-Statistical Methodology","volume":"29 4","pages":"0"},"PeriodicalIF":3.1000,"publicationDate":"2023-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Controlling the false discovery rate in transformational sparsity: Split Knockoffs\",\"authors\":\"Yang Cao, Xinwei Sun, Yuan Yao\",\"doi\":\"10.1093/jrsssb/qkad126\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Abstract Controlling the False Discovery Rate (FDR) in a variable selection procedure is critical for reproducible discoveries, and it has been extensively studied in sparse linear models. However, it remains largely open in scenarios where the sparsity constraint is not directly imposed on the parameters but on a linear transformation of the parameters to be estimated. Examples of such scenarios include total variations, wavelet transforms, fused LASSO, and trend filtering. In this paper, we propose a data-adaptive FDR control method, called the Split Knockoff method, for this transformational sparsity setting. The proposed method exploits both variable and data splitting. The linear transformation constraint is relaxed to its Euclidean proximity in a lifted parameter space, which yields an orthogonal design that enables the orthogonal Split Knockoff construction. To overcome the challenge that exchangeability fails due to the heterogeneous noise brought by the transformation, new inverse supermartingale structures are developed via data splitting for provable FDR control without sacrificing power. Simulation experiments demonstrate that the proposed methodology achieves the desired FDR and power. We also provide an application to Alzheimer’s Disease study, where atrophy brain regions and their abnormal connections can be discovered based on a structural Magnetic Resonance Imaging dataset.\",\"PeriodicalId\":49982,\"journal\":{\"name\":\"Journal of the Royal Statistical Society Series B-Statistical Methodology\",\"volume\":\"29 4\",\"pages\":\"0\"},\"PeriodicalIF\":3.1000,\"publicationDate\":\"2023-11-14\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of the Royal Statistical Society Series B-Statistical Methodology\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1093/jrsssb/qkad126\",\"RegionNum\":1,\"RegionCategory\":\"数学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"STATISTICS & PROBABILITY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of the Royal Statistical Society Series B-Statistical Methodology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1093/jrsssb/qkad126","RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"STATISTICS & PROBABILITY","Score":null,"Total":0}

引用次数: 2

摘要

控制变量选择过程中的错误发现率(FDR)是可重复发现的关键，在稀疏线性模型中得到了广泛的研究。然而，在稀疏性约束不是直接施加在参数上，而是施加在待估计参数的线性变换上的情况下，它仍然很大程度上是开放的。这些场景的示例包括总变化、小波变换、融合LASSO和趋势过滤。在本文中，我们提出了一种数据自适应的FDR控制方法，称为分裂仿造方法，用于这种转换稀疏性设置。该方法同时利用了变量和数据分割。线性变换约束被放宽到其在提升参数空间中的欧几里得接近性，从而产生正交设计，使正交分裂仿造结构成为可能。为了克服变换带来的非均质噪声导致互换性失效的挑战，在不牺牲功率的情况下，通过数据分割开发了新的逆上鞅结构，用于可证明的FDR控制。仿真实验表明，该方法达到了预期的FDR和功率。我们还提供了一个应用程序，以阿尔茨海默病的研究，其中萎缩的大脑区域和他们的异常连接可以发现基于结构磁共振成像数据集。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Controlling the false discovery rate in transformational sparsity: Split Knockoffs

Abstract Controlling the False Discovery Rate (FDR) in a variable selection procedure is critical for reproducible discoveries, and it has been extensively studied in sparse linear models. However, it remains largely open in scenarios where the sparsity constraint is not directly imposed on the parameters but on a linear transformation of the parameters to be estimated. Examples of such scenarios include total variations, wavelet transforms, fused LASSO, and trend filtering. In this paper, we propose a data-adaptive FDR control method, called the Split Knockoff method, for this transformational sparsity setting. The proposed method exploits both variable and data splitting. The linear transformation constraint is relaxed to its Euclidean proximity in a lifted parameter space, which yields an orthogonal design that enables the orthogonal Split Knockoff construction. To overcome the challenge that exchangeability fails due to the heterogeneous noise brought by the transformation, new inverse supermartingale structures are developed via data splitting for provable FDR control without sacrificing power. Simulation experiments demonstrate that the proposed methodology achieves the desired FDR and power. We also provide an application to Alzheimer’s Disease study, where atrophy brain regions and their abnormal connections can be discovered based on a structural Magnetic Resonance Imaging dataset.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Journal of the Royal Statistical Society Series B-Statistical Methodology 数学-统计学与概率论

CiteScore

8.80

自引率

0.00%

发文量

审稿时长

>12 weeks

期刊介绍： Series B (Statistical Methodology) aims to publish high quality papers on the methodological aspects of statistics and data science more broadly. The objective of papers should be to contribute to the understanding of statistical methodology and/or to develop and improve statistical methods; any mathematical theory should be directed towards these aims. The kinds of contribution considered include descriptions of new methods of collecting or analysing data, with the underlying theory, an indication of the scope of application and preferably a real example. Also considered are comparisons, critical evaluations and new applications of existing methods, contributions to probability theory which have a clear practical bearing (including the formulation and analysis of stochastic models), statistical computation or simulation where original methodology is involved and original contributions to the foundations of statistical science. Reviews of methodological techniques are also considered. A paper, even if correct and well presented, is likely to be rejected if it only presents straightforward special cases of previously published work, if it is of mathematical interest only, if it is too long in relation to the importance of the new material that it contains or if it is dominated by computations or simulations of a routine nature.