Pub Date : 2025-01-01Epub Date: 2025-01-13DOI: 10.1214/24-ejs2341
Xi Ninga, Yanqing Sun, Yinghao Pan, Peter B Gilbert
Partly interval-censored data, comprising exact and intervalcensored observations, are prevalent in biomedical, clinical, and epidemiological studies. This paper studies a flexible class of the semiparametric Cox-Aalen transformation models for regression analysis of such data. These models offer a versatile framework by accommodating both multiplicative and additive covariate effects and both constant and time-varying effects within a transformation, while also allowing for potentially time-dependent covariates. Moreover, this class of models includes many popular models such as the semiparametric transformation model, the Cox-Aalen model, the stratified Cox model, and the stratified proportional odds model as special cases. To facilitate efficient computation, we formulate a set of estimating equations and propose an Expectation-Solving (ES) algorithm that guarantees stability and rapid convergence. Under mild regularity assumptions, the resulting estimator is shown to be consistent and asymptotically normal. The validity of the weighted bootstrap is also established. A supremum test is proposed to test the time-varying covariate effects. Finally, the proposed method is evaluated through comprehensive simulations and applied to analyze data from a randomized HIV/AIDS trial.
{"title":"Regression analysis of semiparametric Cox-Aalen transformation models with partly interval-censored data.","authors":"Xi Ninga, Yanqing Sun, Yinghao Pan, Peter B Gilbert","doi":"10.1214/24-ejs2341","DOIUrl":"10.1214/24-ejs2341","url":null,"abstract":"<p><p>Partly interval-censored data, comprising exact and intervalcensored observations, are prevalent in biomedical, clinical, and epidemiological studies. This paper studies a flexible class of the semiparametric Cox-Aalen transformation models for regression analysis of such data. These models offer a versatile framework by accommodating both multiplicative and additive covariate effects and both constant and time-varying effects within a transformation, while also allowing for potentially time-dependent covariates. Moreover, this class of models includes many popular models such as the semiparametric transformation model, the Cox-Aalen model, the stratified Cox model, and the stratified proportional odds model as special cases. To facilitate efficient computation, we formulate a set of estimating equations and propose an Expectation-Solving (ES) algorithm that guarantees stability and rapid convergence. Under mild regularity assumptions, the resulting estimator is shown to be consistent and asymptotically normal. The validity of the weighted bootstrap is also established. A supremum test is proposed to test the time-varying covariate effects. Finally, the proposed method is evaluated through comprehensive simulations and applied to analyze data from a randomized HIV/AIDS trial.</p>","PeriodicalId":49272,"journal":{"name":"Electronic Journal of Statistics","volume":"19 1","pages":"240-290"},"PeriodicalIF":1.3,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11828658/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143442519","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-01Epub Date: 2025-09-05DOI: 10.1214/25-ejs2429
Yiling Huang, Snigdha Panigrahi, Walter Dempsey
Neighborhood selection is a widely used method used for estimating the support set of sparse precision matrices, which helps determine the conditional dependence structure in undirected graphical models. However, reporting only point estimates for the estimated graph can result in poor replicability without accompanying uncertainty estimates. In fields such as psychology, where the lack of replicability is a major concern, there is a growing need for methods that can address this issue. In this paper, we focus on the Gaussian graphical model. We introduce a selective inference method to attach uncertainty estimates to the selected (nonzero) entries of the precision matrix and decide which of the estimated edges must be included in the graph. Our method provides an exact adjustment for the selection of edges, which when multiplied with the Wishart density of the random matrix, results in valid selective inferences. Through the use of externally added randomization variables, our adjustment is easy to compute, requiring us to calculate the probability of a selection event, that is equivalent to a few sign constraints and that decouples across the nodewise regressions. Through simulations and an application to a mobile health trial designed to study mental health, we demonstrate that our selective inference method results in higher power and improved estimation accuracy.
{"title":"Selective Inference for Sparse Graphs via Neighborhood Selection.","authors":"Yiling Huang, Snigdha Panigrahi, Walter Dempsey","doi":"10.1214/25-ejs2429","DOIUrl":"10.1214/25-ejs2429","url":null,"abstract":"<p><p>Neighborhood selection is a widely used method used for estimating the support set of sparse precision matrices, which helps determine the conditional dependence structure in undirected graphical models. However, reporting only point estimates for the estimated graph can result in poor replicability without accompanying uncertainty estimates. In fields such as psychology, where the lack of replicability is a major concern, there is a growing need for methods that can address this issue. In this paper, we focus on the Gaussian graphical model. We introduce a selective inference method to attach uncertainty estimates to the selected (nonzero) entries of the precision matrix and decide which of the estimated edges must be included in the graph. Our method provides an exact adjustment for the selection of edges, which when multiplied with the Wishart density of the random matrix, results in valid selective inferences. Through the use of externally added randomization variables, our adjustment is easy to compute, requiring us to calculate the probability of a selection event, that is equivalent to a few sign constraints and that decouples across the nodewise regressions. Through simulations and an application to a mobile health trial designed to study mental health, we demonstrate that our selective inference method results in higher power and improved estimation accuracy.</p>","PeriodicalId":49272,"journal":{"name":"Electronic Journal of Statistics","volume":"19 2","pages":"4083-4116"},"PeriodicalIF":1.3,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12631974/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145589764","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-01-01Epub Date: 2024-08-27DOI: 10.1214/24-ejs2275
Bohao Tang, Sandipan Pramanik, Yi Zhao, Brian Caffo, Abhirup Datta
In this manuscript, we study scalar-on-distribution regression; that is, instances where subject-specific distributions or densities are the covariates, related to a scalar outcome via a regression model. In practice, only repeated measures are observed from those covariate distributions and common approaches first use these to estimate subject-specific density functions, which are then used as covariates in standard scalar-on-function regression. We propose a simple and direct method for linear scalar-on-distribution regression that circumvents the intermediate step of estimating subject-specific covariate densities. We show that one can directly use the observed repeated measures as covariates and endow the regression function with a Gaussian process prior to obtain a closed form or conjugate Bayesian inference. Our method subsumes the standard Bayesian non-parametric regression using Gaussian processes as a special case, corresponding to covariates being Dirac-distributions. The model is also invariant to any transformation or ordering of the repeated measures. Theoretically, we show that, despite only using the observed repeated measures from the true density-valued covariates that generated the data, the method can achieve an optimal estimation error bound of the regression function. The theory extends beyond i.i.d. settings to accommodate certain forms of within-subject dependence among the repeated measures. To our knowledge, this is the first theoretical study on Bayesian regression using distribution-valued covariates. We propose numerous extensions including a scalable implementation using low-rank Gaussian processes and a generalization to non-linear scalar-on-distribution regression. Through simulation studies, we demonstrate that our method performs substantially better than approaches that require an intermediate density estimation step especially with a small number of repeated measures per subject. We apply our method to study association of age with activity counts.
{"title":"Direct Bayesian linear regression for distribution-valued covariates.","authors":"Bohao Tang, Sandipan Pramanik, Yi Zhao, Brian Caffo, Abhirup Datta","doi":"10.1214/24-ejs2275","DOIUrl":"10.1214/24-ejs2275","url":null,"abstract":"<p><p>In this manuscript, we study scalar-on-distribution regression; that is, instances where subject-specific distributions or densities are the covariates, related to a scalar outcome via a regression model. In practice, only repeated measures are observed from those covariate distributions and common approaches first use these to estimate subject-specific density functions, which are then used as covariates in standard scalar-on-function regression. We propose a simple and direct method for linear scalar-on-distribution regression that circumvents the intermediate step of estimating subject-specific covariate densities. We show that one can directly use the observed repeated measures as covariates and endow the regression function with a Gaussian process prior to obtain a closed form or conjugate Bayesian inference. Our method subsumes the standard Bayesian non-parametric regression using Gaussian processes as a special case, corresponding to covariates being Dirac-distributions. The model is also invariant to any transformation or ordering of the repeated measures. Theoretically, we show that, despite only using the observed repeated measures from the true density-valued covariates that generated the data, the method can achieve an optimal estimation error bound of the regression function. The theory extends beyond i.i.d. settings to accommodate certain forms of within-subject dependence among the repeated measures. To our knowledge, this is the first theoretical study on Bayesian regression using distribution-valued covariates. We propose numerous extensions including a scalable implementation using low-rank Gaussian processes and a generalization to non-linear scalar-on-distribution regression. Through simulation studies, we demonstrate that our method performs substantially better than approaches that require an intermediate density estimation step especially with a small number of repeated measures per subject. We apply our method to study association of age with activity counts.</p>","PeriodicalId":49272,"journal":{"name":"Electronic Journal of Statistics","volume":"18 2","pages":"3327-3375"},"PeriodicalIF":1.0,"publicationDate":"2024-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11466299/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142401736","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-01-01Epub Date: 2024-11-22DOI: 10.1214/24-ejs2311
Lu Mao
The marginal inference of an outcome variable can be improved by closely related covariates with a structured distribution. This differs from standard covariate adjustment in randomized trials, which exploits covariate-treatment independence rather than knowledge on the covariate distribution. Yet it can also be done robustly against misspecification of the outcome-covariate relationship. Starting with a standard estimating function involving only the outcome, we first use a working regression model to compute its conditional expectation given the covariates, and then remove the uninformative part under the covariate distribution model. This effectively projects the initial function onto the joint tangent space of the full data, thereby achieving local efficiency when the regression model is correct. Importantly, even with a faulty working model, the estimator remains unbiased as the subtracted term is always asymptotically centered. Further improvement is possible if the outcome distribution also has its own structure. To demonstrate the process, we consider three examples: one with fully parametric covariates, one with a covariate following a partial parametric model against others, and another with mutually independent covariates.
{"title":"Robust improvement of efficiency using information on covariate distribution.","authors":"Lu Mao","doi":"10.1214/24-ejs2311","DOIUrl":"10.1214/24-ejs2311","url":null,"abstract":"<p><p>The marginal inference of an outcome variable can be improved by closely related covariates with a structured distribution. This differs from standard covariate adjustment in randomized trials, which exploits covariate-treatment independence rather than knowledge on the covariate distribution. Yet it can also be done robustly against misspecification of the outcome-covariate relationship. Starting with a standard estimating function involving only the outcome, we first use a working regression model to compute its conditional expectation given the covariates, and then remove the uninformative part under the covariate distribution model. This effectively projects the initial function onto the joint tangent space of the full data, thereby achieving local efficiency when the regression model is correct. Importantly, even with a faulty working model, the estimator remains unbiased as the subtracted term is always asymptotically centered. Further improvement is possible if the outcome distribution also has its own structure. To demonstrate the process, we consider three examples: one with fully parametric covariates, one with a covariate following a partial parametric model against others, and another with mutually independent covariates.</p>","PeriodicalId":49272,"journal":{"name":"Electronic Journal of Statistics","volume":"18 2","pages":"4640-4666"},"PeriodicalIF":1.0,"publicationDate":"2024-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11633646/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142814607","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-01-01Epub Date: 2024-03-28DOI: 10.1214/24-ejs2237
Jonathan H Huggins, Jeffrey W Miller
Under model misspecification, it is known that Bayesian posteriors often do not properly quantify uncertainty about true or pseudo-true parameters. Even more fundamentally, misspecification leads to a lack of reproducibility in the sense that the same model will yield contradictory posteriors on independent data sets from the true distribution. To define a criterion for reproducible uncertainty quantification under misspecification, we consider the probability that two credible sets constructed from independent data sets have nonempty overlap, and we establish a lower bound on this overlap probability that holds whenever the credible sets are valid confidence sets. We prove that credible sets from the standard posterior can strongly violate this bound, indicating that it is not internally coherent under misspecification. To improve reproducibility in an easy-to-use and widely applicable way, we propose to apply bagging to the Bayesian posterior ("BayesBag"); that is, to use the average of posterior distributions conditioned on bootstrapped datasets. We motivate BayesBag from first principles based on Jeffrey conditionalization and show that the bagged posterior typically satisfies the overlap lower bound. Further, we prove a Bernstein-Von Mises theorem for the bagged posterior, establishing its asymptotic normal distribution. We demonstrate the benefits of BayesBag via simulation experiments and an application to crime rate prediction.
{"title":"Reproducible parameter inference using bagged posteriors.","authors":"Jonathan H Huggins, Jeffrey W Miller","doi":"10.1214/24-ejs2237","DOIUrl":"10.1214/24-ejs2237","url":null,"abstract":"<p><p>Under model misspecification, it is known that Bayesian posteriors often do not properly quantify uncertainty about true or pseudo-true parameters. Even more fundamentally, misspecification leads to a lack of reproducibility in the sense that the same model will yield contradictory posteriors on independent data sets from the true distribution. To define a criterion for reproducible uncertainty quantification under misspecification, we consider the probability that two credible sets constructed from independent data sets have nonempty overlap, and we establish a lower bound on this overlap probability that holds whenever the credible sets are valid confidence sets. We prove that credible sets from the standard posterior can strongly violate this bound, indicating that it is not internally coherent under misspecification. To improve reproducibility in an easy-to-use and widely applicable way, we propose to apply bagging to the Bayesian posterior (\"BayesBag\"); that is, to use the average of posterior distributions conditioned on bootstrapped datasets. We motivate BayesBag from first principles based on Jeffrey conditionalization and show that the bagged posterior typically satisfies the overlap lower bound. Further, we prove a Bernstein-Von Mises theorem for the bagged posterior, establishing its asymptotic normal distribution. We demonstrate the benefits of BayesBag via simulation experiments and an application to crime rate prediction.</p>","PeriodicalId":49272,"journal":{"name":"Electronic Journal of Statistics","volume":"18 1","pages":"1549-1585"},"PeriodicalIF":1.3,"publicationDate":"2024-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12588188/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145460251","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Should we estimate a product of density functions by a product of estimators?","authors":"F. Comte, C. Duval","doi":"10.1214/23-ejs2103","DOIUrl":"https://doi.org/10.1214/23-ejs2103","url":null,"abstract":"","PeriodicalId":49272,"journal":{"name":"Electronic Journal of Statistics","volume":" ","pages":""},"PeriodicalIF":1.1,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47988579","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Statistical inference via conditional Bayesian posteriors in high-dimensional linear regression","authors":"Teng Wu, Naveen N. Narisetty, Yun Yang","doi":"10.1214/23-ejs2113","DOIUrl":"https://doi.org/10.1214/23-ejs2113","url":null,"abstract":"","PeriodicalId":49272,"journal":{"name":"Electronic Journal of Statistics","volume":"1 1","pages":""},"PeriodicalIF":1.1,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41453942","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Large-scale networks are commonly encountered in practice (e.g., Facebook and Twitter) by researchers. In order to study the network interaction between different nodes of large-scale networks, the spatial autoregressive (SAR) model has been popularly employed. Despite its popularity, the estimation of a SAR model on large-scale networks remains very challenging. On the one hand, due to policy limitations or high collection costs, it is often impossible for independent researchers to observe or collect all network information. On the other hand, even if the entire network is accessible, estimating the SAR model using the quasi-maximum likelihood estimator (QMLE) could be computationally infeasible due to its high computational cost. To address these challenges, we propose here a subnetwork estimation method based on QMLE for the SAR model. By using appropriate sampling methods, a subnetwork, consisting of a much-reduced number of nodes, can be constructed. Subsequently, the standard QMLE can be computed by treating the sampled subnetwork as if it were the entire network. This leads to a significant reduction in information collection and model computation costs, which increases the practical feasibility of the effort. Theoretically, we show that the subnetwork-based QMLE is consistent and asymptotically normal under appropriate regularity conditions. Extensive simulation studies, based on both simulated and real network structures, are presented.
{"title":"Subnetwork estimation for spatial autoregressive models in large-scale networks","authors":"Xuetong Li, Feifei Wang, Wei Lan, Hansheng Wang","doi":"10.1214/23-ejs2139","DOIUrl":"https://doi.org/10.1214/23-ejs2139","url":null,"abstract":"Large-scale networks are commonly encountered in practice (e.g., Facebook and Twitter) by researchers. In order to study the network interaction between different nodes of large-scale networks, the spatial autoregressive (SAR) model has been popularly employed. Despite its popularity, the estimation of a SAR model on large-scale networks remains very challenging. On the one hand, due to policy limitations or high collection costs, it is often impossible for independent researchers to observe or collect all network information. On the other hand, even if the entire network is accessible, estimating the SAR model using the quasi-maximum likelihood estimator (QMLE) could be computationally infeasible due to its high computational cost. To address these challenges, we propose here a subnetwork estimation method based on QMLE for the SAR model. By using appropriate sampling methods, a subnetwork, consisting of a much-reduced number of nodes, can be constructed. Subsequently, the standard QMLE can be computed by treating the sampled subnetwork as if it were the entire network. This leads to a significant reduction in information collection and model computation costs, which increases the practical feasibility of the effort. Theoretically, we show that the subnetwork-based QMLE is consistent and asymptotically normal under appropriate regularity conditions. Extensive simulation studies, based on both simulated and real network structures, are presented.","PeriodicalId":49272,"journal":{"name":"Electronic Journal of Statistics","volume":" ","pages":""},"PeriodicalIF":1.1,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42334033","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Kwun Chuen Gary Chan, Hok Kan Ling, Sheung Chi Phillip Yam
{"title":"On nonparametric estimation for cross-sectional sampled data under stationarity","authors":"Kwun Chuen Gary Chan, Hok Kan Ling, Sheung Chi Phillip Yam","doi":"10.1214/23-ejs2163","DOIUrl":"https://doi.org/10.1214/23-ejs2163","url":null,"abstract":"","PeriodicalId":49272,"journal":{"name":"Electronic Journal of Statistics","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135508045","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Envelope methods offer targeted dimension reduction for various statistical models. The goal is to improve efficiency in multivariate parameter estimation by projecting the data onto a lower-dimensional subspace known as the envelope. Envelope approaches have advantages in analyzing data with highly correlated variables, but their iterative Grassmannian optimization algorithms do not scale very well with high-dimensional data. While the connections between envelopes and partial least squares in multivariate linear regression have promoted recent progress in high-dimensional studies of envelopes, we propose a more straightforward way of envelope modeling from a new principal component regression perspective. The proposed procedure, Non-Iterative Envelope Component Estimation (NIECE), has excellent computational advantages over the iterative Grassmannian optimization alternatives in high dimensions. We develop a unified theory that bridges the gap between envelope methods and principal components in regression. The new theoretical insights also shed light on the envelope subspace estimation error as a function of eigenvalue gaps of two symmetric positive definite matrices used in envelope modeling. We apply the new theory and algorithm to several envelope models, including response and predictor reduction in multivariate linear models, logistic regression, and Cox proportional hazard model. Simulations and illustrative data analysis show the potential for NIECE to improve standard methods in linear and generalized linear models significantly.
{"title":"Envelopes and principal component regression","authors":"Xin Zhang, Kai Deng, Qing Mai","doi":"10.1214/23-ejs2154","DOIUrl":"https://doi.org/10.1214/23-ejs2154","url":null,"abstract":"Envelope methods offer targeted dimension reduction for various statistical models. The goal is to improve efficiency in multivariate parameter estimation by projecting the data onto a lower-dimensional subspace known as the envelope. Envelope approaches have advantages in analyzing data with highly correlated variables, but their iterative Grassmannian optimization algorithms do not scale very well with high-dimensional data. While the connections between envelopes and partial least squares in multivariate linear regression have promoted recent progress in high-dimensional studies of envelopes, we propose a more straightforward way of envelope modeling from a new principal component regression perspective. The proposed procedure, Non-Iterative Envelope Component Estimation (NIECE), has excellent computational advantages over the iterative Grassmannian optimization alternatives in high dimensions. We develop a unified theory that bridges the gap between envelope methods and principal components in regression. The new theoretical insights also shed light on the envelope subspace estimation error as a function of eigenvalue gaps of two symmetric positive definite matrices used in envelope modeling. We apply the new theory and algorithm to several envelope models, including response and predictor reduction in multivariate linear models, logistic regression, and Cox proportional hazard model. Simulations and illustrative data analysis show the potential for NIECE to improve standard methods in linear and generalized linear models significantly.","PeriodicalId":49272,"journal":{"name":"Electronic Journal of Statistics","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136207137","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}