Pub Date : 2024-09-02eCollection Date: 2025-02-01DOI: 10.1093/jrsssb/qkae088
Runshi Tang, Ming Yuan, Anru R Zhang
This paper introduces a novel framework called Mode-wise Principal Subspace Pursuit (MOP-UP) to extract hidden variations in both the row and column dimensions for matrix data. To enhance the understanding of the framework, we introduce a class of matrix-variate spiked covariance models that serve as inspiration for the development of the MOP-UP algorithm. The MOP-UP algorithm consists of two steps: Average Subspace Capture (ASC) and Alternating Projection. These steps are specifically designed to capture the row-wise and column-wise dimension-reduced subspaces which contain the most informative features of the data. ASC utilizes a novel average projection operator as initialization and achieves exact recovery in the noiseless setting. We analyse the convergence and non-asymptotic error bounds of MOP-UP, introducing a blockwise matrix eigenvalue perturbation bound that proves the desired bound, where classic perturbation bounds fail. The effectiveness and practical merits of the proposed framework are demonstrated through experiments on both simulated and real datasets. Lastly, we discuss generalizations of our approach to higher-order data.
{"title":"Mode-wise principal subspace pursuit and matrix spiked covariance model.","authors":"Runshi Tang, Ming Yuan, Anru R Zhang","doi":"10.1093/jrsssb/qkae088","DOIUrl":"10.1093/jrsssb/qkae088","url":null,"abstract":"<p><p>This paper introduces a novel framework called Mode-wise Principal Subspace Pursuit (MOP-UP) to extract hidden variations in both the row and column dimensions for matrix data. To enhance the understanding of the framework, we introduce a class of matrix-variate spiked covariance models that serve as inspiration for the development of the MOP-UP algorithm. The MOP-UP algorithm consists of two steps: Average Subspace Capture (ASC) and Alternating Projection. These steps are specifically designed to capture the row-wise and column-wise dimension-reduced subspaces which contain the most informative features of the data. ASC utilizes a novel average projection operator as initialization and achieves exact recovery in the noiseless setting. We analyse the convergence and non-asymptotic error bounds of MOP-UP, introducing a blockwise matrix eigenvalue perturbation bound that proves the desired bound, where classic perturbation bounds fail. The effectiveness and practical merits of the proposed framework are demonstrated through experiments on both simulated and real datasets. Lastly, we discuss generalizations of our approach to higher-order data.</p>","PeriodicalId":49982,"journal":{"name":"Journal of the Royal Statistical Society Series B-Statistical Methodology","volume":"87 1","pages":"232-255"},"PeriodicalIF":3.1,"publicationDate":"2024-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11809223/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143411335","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-05eCollection Date: 2025-02-01DOI: 10.1093/jrsssb/qkae082
Faming Liang, Sehwan Kim, Yan Sun
While fiducial inference was widely considered a big blunder by R.A. Fisher, the goal he initially set-'inferring the uncertainty of model parameters on the basis of observations'-has been continually pursued by many statisticians. To this end, we develop a new statistical inference method called extended Fiducial inference (EFI). The new method achieves the goal of fiducial inference by leveraging advanced statistical computing techniques while remaining scalable for big data. Extended Fiducial inference involves jointly imputing random errors realized in observations using stochastic gradient Markov chain Monte Carlo and estimating the inverse function using a sparse deep neural network (DNN). The consistency of the sparse DNN estimator ensures that the uncertainty embedded in observations is properly propagated to model parameters through the estimated inverse function, thereby validating downstream statistical inference. Compared to frequentist and Bayesian methods, EFI offers significant advantages in parameter estimation and hypothesis testing. Specifically, EFI provides higher fidelity in parameter estimation, especially when outliers are present in the observations; and eliminates the need for theoretical reference distributions in hypothesis testing, thereby automating the statistical inference process. Extended Fiducial inference also provides an innovative framework for semisupervised learning.
{"title":"Extended fiducial inference: toward an automated process of statistical inference.","authors":"Faming Liang, Sehwan Kim, Yan Sun","doi":"10.1093/jrsssb/qkae082","DOIUrl":"10.1093/jrsssb/qkae082","url":null,"abstract":"<p><p>While fiducial inference was widely considered a big blunder by R.A. Fisher, the goal he initially set-'inferring the uncertainty of model parameters on the basis of observations'-has been continually pursued by many statisticians. To this end, we develop a new statistical inference method called extended Fiducial inference (EFI). The new method achieves the goal of fiducial inference by leveraging advanced statistical computing techniques while remaining scalable for big data. Extended Fiducial inference involves jointly imputing random errors realized in observations using stochastic gradient Markov chain Monte Carlo and estimating the inverse function using a sparse deep neural network (DNN). The consistency of the sparse DNN estimator ensures that the uncertainty embedded in observations is properly propagated to model parameters through the estimated inverse function, thereby validating downstream statistical inference. Compared to frequentist and Bayesian methods, EFI offers significant advantages in parameter estimation and hypothesis testing. Specifically, EFI provides higher fidelity in parameter estimation, especially when outliers are present in the observations; and eliminates the need for theoretical reference distributions in hypothesis testing, thereby automating the statistical inference process. Extended Fiducial inference also provides an innovative framework for semisupervised learning.</p>","PeriodicalId":49982,"journal":{"name":"Journal of the Royal Statistical Society Series B-Statistical Methodology","volume":"87 1","pages":"98-131"},"PeriodicalIF":3.1,"publicationDate":"2024-08-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11809222/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143400591","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-06-14eCollection Date: 2025-02-01DOI: 10.1093/jrsssb/qkae042
Paula Gablenz, Chiara Sabatti
We consider problems where many, somewhat redundant, hypotheses are tested and we are interested in reporting the most precise rejections, with false discovery rate (FDR) control. This is the case, for example, when researchers are interested both in individual hypotheses as well as group hypotheses corresponding to intersections of sets of the original hypotheses, at several resolution levels. A concrete application is in genome-wide association studies, where, depending on the signal strengths, it might be possible to resolve the influence of individual genetic variants on a phenotype with greater or lower precision. To adapt to the unknown signal strength, analyses are conducted at multiple resolutions and researchers are most interested in the more precise discoveries. Assuring FDR control on the reported findings with these adaptive searches is, however, often impossible. To design a multiple comparison procedure that allows for an adaptive choice of resolution with FDR control, we leverage e-values and linear programming. We adapt this approach to problems where knockoffs and group knockoffs have been successfully applied to test conditional independence hypotheses. We demonstrate its efficacy by analysing data from the UK Biobank.
{"title":"Catch me if you can: signal localization with knockoff <i>e</i>-values.","authors":"Paula Gablenz, Chiara Sabatti","doi":"10.1093/jrsssb/qkae042","DOIUrl":"10.1093/jrsssb/qkae042","url":null,"abstract":"<p><p>We consider problems where many, somewhat redundant, hypotheses are tested and we are interested in reporting the most precise rejections, with false discovery rate (FDR) control. This is the case, for example, when researchers are interested both in individual hypotheses as well as group hypotheses corresponding to intersections of sets of the original hypotheses, at several resolution levels. A concrete application is in genome-wide association studies, where, depending on the signal strengths, it might be possible to resolve the influence of individual genetic variants on a phenotype with greater or lower precision. To adapt to the unknown signal strength, analyses are conducted at multiple resolutions and researchers are most interested in the more precise discoveries. Assuring FDR control on the reported findings with these adaptive searches is, however, often impossible. To design a multiple comparison procedure that allows for an adaptive choice of resolution with FDR control, we leverage <i>e</i>-values and linear programming. We adapt this approach to problems where knockoffs and group knockoffs have been successfully applied to test conditional independence hypotheses. We demonstrate its efficacy by analysing data from the UK Biobank.</p>","PeriodicalId":49982,"journal":{"name":"Journal of the Royal Statistical Society Series B-Statistical Methodology","volume":"87 1","pages":"56-73"},"PeriodicalIF":3.1,"publicationDate":"2024-06-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11809227/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143400584","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-05-03eCollection Date: 2024-11-01DOI: 10.1093/jrsssb/qkae034
Zhiqiang Tan
Consider sensitivity analysis for estimating average treatment effects under unmeasured confounding, assumed to satisfy a marginal sensitivity model. At the population level, we provide new representations for the sharp population bounds and doubly robust estimating functions. We also derive new, relaxed population bounds, depending on weighted linear outcome quantile regression. At the sample level, we develop new methods and theory for obtaining not only doubly robust point estimators for the relaxed population bounds with respect to misspecification of a propensity score model or an outcome mean regression model, but also model-assisted confidence intervals which are valid if the propensity score model is correctly specified, but the outcome quantile and mean regression models may be misspecified. The relaxed population bounds reduce to the sharp bounds if outcome quantile regression is correctly specified. For a linear outcome mean regression model, the confidence intervals are also doubly robust. Our methods involve regularized calibrated estimation, with Lasso penalties but carefully chosen loss functions, for fitting propensity score and outcome mean and quantile regression models. We present a simulation study and an empirical application to an observational study on the effects of right-heart catheterization. The proposed method is implemented in the R package RCALsa.
在假设满足边际敏感性模型的情况下,考虑对未测量混杂因素下的平均治疗效果进行估计的敏感性分析。在人群水平上,我们为尖锐人群界限和双重稳健估计函数提供了新的表示方法。我们还根据加权线性结果量子回归推导出新的、宽松的人群界限。在样本层面,我们开发了新的方法和理论,不仅可以获得与倾向评分模型或结果均值回归模型的错误指定有关的松弛总体边界的双重稳健点估计值,还可以获得模型辅助置信区间,如果倾向评分模型指定正确,但结果量化和均值回归模型可能被错误指定,则置信区间有效。如果结果量值回归模型指定正确,则放宽的人口边界可减小为尖锐边界。对于线性结果均值回归模型,置信区间也具有双重稳健性。我们的方法涉及正则化校准估计,利用 Lasso 惩罚和精心选择的损失函数来拟合倾向评分和结果均值及量化回归模型。我们介绍了一项模拟研究和一项关于右心导管治疗效果的观察性研究的经验应用。提出的方法在 R 软件包 RCALsa 中实现。
{"title":"Model-assisted sensitivity analysis for treatment effects under unmeasured confounding via regularized calibrated estimation.","authors":"Zhiqiang Tan","doi":"10.1093/jrsssb/qkae034","DOIUrl":"10.1093/jrsssb/qkae034","url":null,"abstract":"<p><p>Consider sensitivity analysis for estimating average treatment effects under unmeasured confounding, assumed to satisfy a marginal sensitivity model. At the population level, we provide new representations for the sharp population bounds and doubly robust estimating functions. We also derive new, relaxed population bounds, depending on weighted linear outcome quantile regression. At the sample level, we develop new methods and theory for obtaining not only doubly robust point estimators for the relaxed population bounds with respect to misspecification of a propensity score model or an outcome mean regression model, but also model-assisted confidence intervals which are valid if the propensity score model is correctly specified, but the outcome quantile and mean regression models may be misspecified. The relaxed population bounds reduce to the sharp bounds if outcome quantile regression is correctly specified. For a linear outcome mean regression model, the confidence intervals are also doubly robust. Our methods involve regularized calibrated estimation, with Lasso penalties but carefully chosen loss functions, for fitting propensity score and outcome mean and quantile regression models. We present a simulation study and an empirical application to an observational study on the effects of right-heart catheterization. The proposed method is implemented in the R package RCALsa.</p>","PeriodicalId":49982,"journal":{"name":"Journal of the Royal Statistical Society Series B-Statistical Methodology","volume":"86 5","pages":"1339-1363"},"PeriodicalIF":3.1,"publicationDate":"2024-05-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11558804/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142631137","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-03-22eCollection Date: 2024-09-01DOI: 10.1093/jrsssb/qkae023
Eardi Lila, Wenbo Zhang, Swati Rane Levendovszky
We introduce a novel framework for the classification of functional data supported on nonlinear, and possibly random, manifold domains. The motivating application is the identification of subjects with Alzheimer's disease from their cortical surface geometry and associated cortical thickness map. The proposed model is based upon a reformulation of the classification problem as a regularized multivariate functional linear regression model. This allows us to adopt a direct approach to the estimation of the most discriminant direction while controlling for its complexity with appropriate differential regularization. Our approach does not require prior estimation of the covariance structure of the functional predictors, which is computationally prohibitive in our application setting. We provide a theoretical analysis of the out-of-sample prediction error of the proposed model and explore the finite sample performance in a simulation setting. We apply the proposed method to a pooled dataset from Alzheimer's Disease Neuroimaging Initiative and Parkinson's Progression Markers Initiative. Through this application, we identify discriminant directions that capture both cortical geometric and thickness predictive features of Alzheimer's disease that are consistent with the existing neuroscience literature.
{"title":"Interpretable discriminant analysis for functional data supported on random nonlinear domains with an application to Alzheimer's disease.","authors":"Eardi Lila, Wenbo Zhang, Swati Rane Levendovszky","doi":"10.1093/jrsssb/qkae023","DOIUrl":"10.1093/jrsssb/qkae023","url":null,"abstract":"<p><p>We introduce a novel framework for the classification of functional data supported on nonlinear, and possibly random, manifold domains. The motivating application is the identification of subjects with Alzheimer's disease from their cortical surface geometry and associated cortical thickness map. The proposed model is based upon a reformulation of the classification problem as a regularized multivariate functional linear regression model. This allows us to adopt a direct approach to the estimation of the most discriminant direction while controlling for its complexity with appropriate differential regularization. Our approach does not require prior estimation of the covariance structure of the functional predictors, which is computationally prohibitive in our application setting. We provide a theoretical analysis of the out-of-sample prediction error of the proposed model and explore the finite sample performance in a simulation setting. We apply the proposed method to a pooled dataset from Alzheimer's Disease Neuroimaging Initiative and Parkinson's Progression Markers Initiative. Through this application, we identify discriminant directions that capture both cortical geometric and thickness predictive features of Alzheimer's disease that are consistent with the existing neuroscience literature.</p>","PeriodicalId":49982,"journal":{"name":"Journal of the Royal Statistical Society Series B-Statistical Methodology","volume":"86 4","pages":"1013-1044"},"PeriodicalIF":3.1,"publicationDate":"2024-03-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11398888/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142299609","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-03-14eCollection Date: 2024-09-01DOI: 10.1093/jrsssb/qkae024
Ting Ye, Zhonghua Liu, Baoluo Sun, Eric Tchetgen Tchetgen
Mendelian randomization (MR) addresses causal questions using genetic variants as instrumental variables. We propose a new MR method, G-Estimation under No Interaction with Unmeasured Selection (GENIUS)-MAny Weak Invalid IV, which simultaneously addresses the 2 salient challenges in MR: many weak instruments and widespread horizontal pleiotropy. Similar to MR-GENIUS, we use heteroscedasticity of the exposure to identify the treatment effect. We derive influence functions of the treatment effect, and then we construct a continuous updating estimator and establish its asymptotic properties under a many weak invalid instruments asymptotic regime by developing novel semiparametric theory. We also provide a measure of weak identification, an overidentification test, and a graphical diagnostic tool.
{"title":"GENIUS-MAWII: for robust Mendelian randomization with many weak invalid instruments.","authors":"Ting Ye, Zhonghua Liu, Baoluo Sun, Eric Tchetgen Tchetgen","doi":"10.1093/jrsssb/qkae024","DOIUrl":"10.1093/jrsssb/qkae024","url":null,"abstract":"<p><p>Mendelian randomization (MR) addresses causal questions using genetic variants as instrumental variables. We propose a new MR method, G-Estimation under No Interaction with Unmeasured Selection (GENIUS)-MAny Weak Invalid IV, which simultaneously addresses the 2 salient challenges in MR: many weak instruments and widespread horizontal pleiotropy. Similar to MR-GENIUS, we use heteroscedasticity of the exposure to identify the treatment effect. We derive influence functions of the treatment effect, and then we construct a continuous updating estimator and establish its asymptotic properties under a many weak invalid instruments asymptotic regime by developing novel semiparametric theory. We also provide a measure of weak identification, an overidentification test, and a graphical diagnostic tool.</p>","PeriodicalId":49982,"journal":{"name":"Journal of the Royal Statistical Society Series B-Statistical Methodology","volume":"86 4","pages":"1045-1067"},"PeriodicalIF":3.1,"publicationDate":"2024-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11398887/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142299608","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-03-04eCollection Date: 2024-09-01DOI: 10.1093/jrsssb/qkae009
Yachong Yang, Arun Kumar Kuchibhotla, Eric Tchetgen Tchetgen
Conformal prediction has received tremendous attention in recent years and has offered new solutions to problems in missing data and causal inference; yet these advances have not leveraged modern semi-parametric efficiency theory for more efficient uncertainty quantification. We consider the problem of obtaining well-calibrated prediction regions that can data adaptively account for a shift in the distribution of covariates between training and test data. Under a covariate shift assumption analogous to the standard missing at random assumption, we propose a general framework based on efficient influence functions to construct well-calibrated prediction regions for the unobserved outcome in the test sample without compromising coverage.
{"title":"Doubly robust calibration of prediction sets under covariate shift.","authors":"Yachong Yang, Arun Kumar Kuchibhotla, Eric Tchetgen Tchetgen","doi":"10.1093/jrsssb/qkae009","DOIUrl":"10.1093/jrsssb/qkae009","url":null,"abstract":"<p><p>Conformal prediction has received tremendous attention in recent years and has offered new solutions to problems in missing data and causal inference; yet these advances have not leveraged modern semi-parametric efficiency theory for more efficient uncertainty quantification. We consider the problem of obtaining well-calibrated prediction regions that can data adaptively account for a shift in the distribution of covariates between training and test data. Under a covariate shift assumption analogous to the standard missing at random assumption, we propose a general framework based on efficient influence functions to construct well-calibrated prediction regions for the unobserved outcome in the test sample without compromising coverage.</p>","PeriodicalId":49982,"journal":{"name":"Journal of the Royal Statistical Society Series B-Statistical Methodology","volume":"86 4","pages":"943-965"},"PeriodicalIF":3.1,"publicationDate":"2024-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11398884/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142299607","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-01-22eCollection Date: 2024-07-01DOI: 10.1093/jrsssb/qkad140
Yaqing Chen, Shu-Chin Lin, Yang Zhou, Owen Carmichael, Hans-Georg Müller, Jane-Ling Wang
Quantifying the association between components of multivariate random curves is of general interest and is a ubiquitous and basic problem that can be addressed with functional data analysis. An important application is the problem of assessing functional connectivity based on functional magnetic resonance imaging (fMRI), where one aims to determine the similarity of fMRI time courses that are recorded on anatomically separated brain regions. In the functional brain connectivity literature, the static temporal Pearson correlation has been the prevailing measure for functional connectivity. However, recent research has revealed temporally changing patterns of functional connectivity, leading to the study of dynamic functional connectivity. This motivates new similarity measures for pairs of random curves that reflect the dynamic features of functional similarity. Specifically, we introduce gradient synchronization measures in a general setting. These similarity measures are based on the concordance and discordance of the gradients between paired smooth random functions. Asymptotic normality of the proposed estimates is obtained under regularity conditions. We illustrate the proposed synchronization measures via simulations and an application to resting-state fMRI signals from the Alzheimer's Disease Neuroimaging Initiative and they are found to improve discrimination between subjects with different disease status.
{"title":"Gradient synchronization for multivariate functional data, with application to brain connectivity.","authors":"Yaqing Chen, Shu-Chin Lin, Yang Zhou, Owen Carmichael, Hans-Georg Müller, Jane-Ling Wang","doi":"10.1093/jrsssb/qkad140","DOIUrl":"10.1093/jrsssb/qkad140","url":null,"abstract":"<p><p>Quantifying the association between components of multivariate random curves is of general interest and is a ubiquitous and basic problem that can be addressed with functional data analysis. An important application is the problem of assessing functional connectivity based on functional magnetic resonance imaging (fMRI), where one aims to determine the similarity of fMRI time courses that are recorded on anatomically separated brain regions. In the functional brain connectivity literature, the static temporal Pearson correlation has been the prevailing measure for functional connectivity. However, recent research has revealed temporally changing patterns of functional connectivity, leading to the study of dynamic functional connectivity. This motivates new similarity measures for pairs of random curves that reflect the dynamic features of functional similarity. Specifically, we introduce gradient synchronization measures in a general setting. These similarity measures are based on the concordance and discordance of the gradients between paired smooth random functions. Asymptotic normality of the proposed estimates is obtained under regularity conditions. We illustrate the proposed synchronization measures via simulations and an application to resting-state fMRI signals from the Alzheimer's Disease Neuroimaging Initiative and they are found to improve discrimination between subjects with different disease status.</p>","PeriodicalId":49982,"journal":{"name":"Journal of the Royal Statistical Society Series B-Statistical Methodology","volume":"86 3","pages":"694-713"},"PeriodicalIF":3.1,"publicationDate":"2024-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11239314/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141617543","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-12-15eCollection Date: 2024-04-01DOI: 10.1093/jrsssb/qkad132
Naoki Egami, Eric J Tchetgen Tchetgen
Identification and estimation of causal peer effects are challenging in observational studies for two reasons. The first is the identification challenge due to unmeasured network confounding, for example, homophily bias and contextual confounding. The second is network dependence of observations. We establish a framework that leverages a pair of negative control outcome and exposure variables (double negative controls) to non-parametrically identify causal peer effects in the presence of unmeasured network confounding. We then propose a generalised method of moments estimator and establish its consistency and asymptotic normality under an assumption about ψ-network dependence. Finally, we provide a consistent variance estimator.
{"title":"Identification and estimation of causal peer effects using double negative controls for unmeasured network confounding.","authors":"Naoki Egami, Eric J Tchetgen Tchetgen","doi":"10.1093/jrsssb/qkad132","DOIUrl":"10.1093/jrsssb/qkad132","url":null,"abstract":"<p><p>Identification and estimation of causal peer effects are challenging in observational studies for two reasons. The first is the identification challenge due to unmeasured network confounding, for example, homophily bias and contextual confounding. The second is network dependence of observations. We establish a framework that leverages a pair of negative control outcome and exposure variables (double negative controls) to non-parametrically identify causal peer effects in the presence of unmeasured network confounding. We then propose a generalised method of moments estimator and establish its consistency and asymptotic normality under an assumption about <i>ψ</i>-network dependence. Finally, we provide a consistent variance estimator.</p>","PeriodicalId":49982,"journal":{"name":"Journal of the Royal Statistical Society Series B-Statistical Methodology","volume":"86 2","pages":"487-511"},"PeriodicalIF":3.1,"publicationDate":"2023-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11009281/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140873435","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Abstract Controlling the False Discovery Rate (FDR) in a variable selection procedure is critical for reproducible discoveries, and it has been extensively studied in sparse linear models. However, it remains largely open in scenarios where the sparsity constraint is not directly imposed on the parameters but on a linear transformation of the parameters to be estimated. Examples of such scenarios include total variations, wavelet transforms, fused LASSO, and trend filtering. In this paper, we propose a data-adaptive FDR control method, called the Split Knockoff method, for this transformational sparsity setting. The proposed method exploits both variable and data splitting. The linear transformation constraint is relaxed to its Euclidean proximity in a lifted parameter space, which yields an orthogonal design that enables the orthogonal Split Knockoff construction. To overcome the challenge that exchangeability fails due to the heterogeneous noise brought by the transformation, new inverse supermartingale structures are developed via data splitting for provable FDR control without sacrificing power. Simulation experiments demonstrate that the proposed methodology achieves the desired FDR and power. We also provide an application to Alzheimer’s Disease study, where atrophy brain regions and their abnormal connections can be discovered based on a structural Magnetic Resonance Imaging dataset.
{"title":"Controlling the false discovery rate in transformational sparsity: Split Knockoffs","authors":"Yang Cao, Xinwei Sun, Yuan Yao","doi":"10.1093/jrsssb/qkad126","DOIUrl":"https://doi.org/10.1093/jrsssb/qkad126","url":null,"abstract":"Abstract Controlling the False Discovery Rate (FDR) in a variable selection procedure is critical for reproducible discoveries, and it has been extensively studied in sparse linear models. However, it remains largely open in scenarios where the sparsity constraint is not directly imposed on the parameters but on a linear transformation of the parameters to be estimated. Examples of such scenarios include total variations, wavelet transforms, fused LASSO, and trend filtering. In this paper, we propose a data-adaptive FDR control method, called the Split Knockoff method, for this transformational sparsity setting. The proposed method exploits both variable and data splitting. The linear transformation constraint is relaxed to its Euclidean proximity in a lifted parameter space, which yields an orthogonal design that enables the orthogonal Split Knockoff construction. To overcome the challenge that exchangeability fails due to the heterogeneous noise brought by the transformation, new inverse supermartingale structures are developed via data splitting for provable FDR control without sacrificing power. Simulation experiments demonstrate that the proposed methodology achieves the desired FDR and power. We also provide an application to Alzheimer’s Disease study, where atrophy brain regions and their abnormal connections can be discovered based on a structural Magnetic Resonance Imaging dataset.","PeriodicalId":49982,"journal":{"name":"Journal of the Royal Statistical Society Series B-Statistical Methodology","volume":"29 4","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134991933","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}