首页 > 最新文献

Electronic Journal of Statistics最新文献

英文 中文
Regression analysis of semiparametric Cox-Aalen transformation models with partly interval-censored data. 采用部分区间删失数据的半参数 Cox-Aalen 转换模型的回归分析。
IF 1 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2025-01-01 Epub Date: 2025-01-13 DOI: 10.1214/24-ejs2341
Xi Ninga, Yanqing Sun, Yinghao Pan, Peter B Gilbert

Partly interval-censored data, comprising exact and intervalcensored observations, are prevalent in biomedical, clinical, and epidemiological studies. This paper studies a flexible class of the semiparametric Cox-Aalen transformation models for regression analysis of such data. These models offer a versatile framework by accommodating both multiplicative and additive covariate effects and both constant and time-varying effects within a transformation, while also allowing for potentially time-dependent covariates. Moreover, this class of models includes many popular models such as the semiparametric transformation model, the Cox-Aalen model, the stratified Cox model, and the stratified proportional odds model as special cases. To facilitate efficient computation, we formulate a set of estimating equations and propose an Expectation-Solving (ES) algorithm that guarantees stability and rapid convergence. Under mild regularity assumptions, the resulting estimator is shown to be consistent and asymptotically normal. The validity of the weighted bootstrap is also established. A supremum test is proposed to test the time-varying covariate effects. Finally, the proposed method is evaluated through comprehensive simulations and applied to analyze data from a randomized HIV/AIDS trial.

{"title":"Regression analysis of semiparametric Cox-Aalen transformation models with partly interval-censored data.","authors":"Xi Ninga, Yanqing Sun, Yinghao Pan, Peter B Gilbert","doi":"10.1214/24-ejs2341","DOIUrl":"10.1214/24-ejs2341","url":null,"abstract":"<p><p>Partly interval-censored data, comprising exact and intervalcensored observations, are prevalent in biomedical, clinical, and epidemiological studies. This paper studies a flexible class of the semiparametric Cox-Aalen transformation models for regression analysis of such data. These models offer a versatile framework by accommodating both multiplicative and additive covariate effects and both constant and time-varying effects within a transformation, while also allowing for potentially time-dependent covariates. Moreover, this class of models includes many popular models such as the semiparametric transformation model, the Cox-Aalen model, the stratified Cox model, and the stratified proportional odds model as special cases. To facilitate efficient computation, we formulate a set of estimating equations and propose an Expectation-Solving (ES) algorithm that guarantees stability and rapid convergence. Under mild regularity assumptions, the resulting estimator is shown to be consistent and asymptotically normal. The validity of the weighted bootstrap is also established. A supremum test is proposed to test the time-varying covariate effects. Finally, the proposed method is evaluated through comprehensive simulations and applied to analyze data from a randomized HIV/AIDS trial.</p>","PeriodicalId":49272,"journal":{"name":"Electronic Journal of Statistics","volume":"19 1","pages":"240-290"},"PeriodicalIF":1.0,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11828658/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143442519","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Direct Bayesian linear regression for distribution-valued covariates. 分布值协变量的直接贝叶斯线性回归。
IF 1 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2024-01-01 Epub Date: 2024-08-27 DOI: 10.1214/24-ejs2275
Bohao Tang, Sandipan Pramanik, Yi Zhao, Brian Caffo, Abhirup Datta

In this manuscript, we study scalar-on-distribution regression; that is, instances where subject-specific distributions or densities are the covariates, related to a scalar outcome via a regression model. In practice, only repeated measures are observed from those covariate distributions and common approaches first use these to estimate subject-specific density functions, which are then used as covariates in standard scalar-on-function regression. We propose a simple and direct method for linear scalar-on-distribution regression that circumvents the intermediate step of estimating subject-specific covariate densities. We show that one can directly use the observed repeated measures as covariates and endow the regression function with a Gaussian process prior to obtain a closed form or conjugate Bayesian inference. Our method subsumes the standard Bayesian non-parametric regression using Gaussian processes as a special case, corresponding to covariates being Dirac-distributions. The model is also invariant to any transformation or ordering of the repeated measures. Theoretically, we show that, despite only using the observed repeated measures from the true density-valued covariates that generated the data, the method can achieve an optimal estimation error bound of the regression function. The theory extends beyond i.i.d. settings to accommodate certain forms of within-subject dependence among the repeated measures. To our knowledge, this is the first theoretical study on Bayesian regression using distribution-valued covariates. We propose numerous extensions including a scalable implementation using low-rank Gaussian processes and a generalization to non-linear scalar-on-distribution regression. Through simulation studies, we demonstrate that our method performs substantially better than approaches that require an intermediate density estimation step especially with a small number of repeated measures per subject. We apply our method to study association of age with activity counts.

在本手稿中,我们研究的是标量-分布回归,即以特定受试者的分布或密度作为协变量,通过回归模型与标量结果相关联的情况。在实践中,只能从这些协变量分布中观察到重复测量值,常用的方法首先使用这些协变量分布来估计特定受试者的密度函数,然后在标准的标量-函数回归中将其用作协变量。我们提出了一种简单直接的线性标量-分布回归方法,该方法避开了估计特定受试者协变量密度这一中间步骤。我们证明,可以直接使用观测到的重复测量值作为协变量,并为回归函数赋予高斯过程先验,从而获得封闭形式或共轭贝叶斯推断。我们的方法将使用高斯过程的标准贝叶斯非参数回归归为特例,与协变量为狄拉克分布相对应。该模型还不受重复测量的任何变换或排序的影响。我们从理论上证明,尽管只使用了从产生数据的真实密度值协变量中观察到的重复测量值,该方法仍能实现回归函数的最优估计误差约束。该理论超越了 i.i.d.设置,以适应重复测量中某些形式的受试者内依赖性。据我们所知,这是首次对使用分布值协变量的贝叶斯回归进行理论研究。我们提出了许多扩展建议,包括使用低秩高斯过程的可扩展实现,以及对分布上非线性标量回归的概括。通过模拟研究,我们证明了我们的方法比那些需要中间密度估计步骤的方法要好得多,尤其是在每个研究对象重复测量次数较少的情况下。我们将我们的方法应用于研究年龄与活动计数的关联。
{"title":"Direct Bayesian linear regression for distribution-valued covariates.","authors":"Bohao Tang, Sandipan Pramanik, Yi Zhao, Brian Caffo, Abhirup Datta","doi":"10.1214/24-ejs2275","DOIUrl":"10.1214/24-ejs2275","url":null,"abstract":"<p><p>In this manuscript, we study scalar-on-distribution regression; that is, instances where subject-specific distributions or densities are the covariates, related to a scalar outcome via a regression model. In practice, only repeated measures are observed from those covariate distributions and common approaches first use these to estimate subject-specific density functions, which are then used as covariates in standard scalar-on-function regression. We propose a simple and direct method for linear scalar-on-distribution regression that circumvents the intermediate step of estimating subject-specific covariate densities. We show that one can directly use the observed repeated measures as covariates and endow the regression function with a Gaussian process prior to obtain a closed form or conjugate Bayesian inference. Our method subsumes the standard Bayesian non-parametric regression using Gaussian processes as a special case, corresponding to covariates being Dirac-distributions. The model is also invariant to any transformation or ordering of the repeated measures. Theoretically, we show that, despite only using the observed repeated measures from the true density-valued covariates that generated the data, the method can achieve an optimal estimation error bound of the regression function. The theory extends beyond i.i.d. settings to accommodate certain forms of within-subject dependence among the repeated measures. To our knowledge, this is the first theoretical study on Bayesian regression using distribution-valued covariates. We propose numerous extensions including a scalable implementation using low-rank Gaussian processes and a generalization to non-linear scalar-on-distribution regression. Through simulation studies, we demonstrate that our method performs substantially better than approaches that require an intermediate density estimation step especially with a small number of repeated measures per subject. We apply our method to study association of age with activity counts.</p>","PeriodicalId":49272,"journal":{"name":"Electronic Journal of Statistics","volume":"18 2","pages":"3327-3375"},"PeriodicalIF":1.0,"publicationDate":"2024-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11466299/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142401736","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Robust improvement of efficiency using information on covariate distribution. 利用协变量分布信息的鲁棒性效率改进。
IF 1 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2024-01-01 Epub Date: 2024-11-22 DOI: 10.1214/24-ejs2311
Lu Mao

The marginal inference of an outcome variable can be improved by closely related covariates with a structured distribution. This differs from standard covariate adjustment in randomized trials, which exploits covariate-treatment independence rather than knowledge on the covariate distribution. Yet it can also be done robustly against misspecification of the outcome-covariate relationship. Starting with a standard estimating function involving only the outcome, we first use a working regression model to compute its conditional expectation given the covariates, and then remove the uninformative part under the covariate distribution model. This effectively projects the initial function onto the joint tangent space of the full data, thereby achieving local efficiency when the regression model is correct. Importantly, even with a faulty working model, the estimator remains unbiased as the subtracted term is always asymptotically centered. Further improvement is possible if the outcome distribution also has its own structure. To demonstrate the process, we consider three examples: one with fully parametric covariates, one with a covariate following a partial parametric model against others, and another with mutually independent covariates.

结果变量的边际推断可以通过具有结构化分布的密切相关协变量来改进。这与随机试验中的标准协变量调整不同,随机试验利用协变量处理独立性,而不是对协变量分布的了解。然而,它也可以健壮地防止结果-协变量关系的错误说明。从一个只涉及结果的标准估计函数开始,我们首先使用一个工作回归模型来计算给定协变量的条件期望,然后在协变量分布模型下去除非信息部分。这有效地将初始函数投影到全数据的联合切线空间上,从而在回归模型正确的情况下实现局部效率。重要的是,即使有一个错误的工作模型,估计量仍然是无偏的,因为减去的项总是渐近中心。如果结果分布也有自己的结构,进一步的改进是可能的。为了演示这个过程,我们考虑了三个例子:一个是全参数协变量,一个是部分参数模型的协变量,另一个是相互独立的协变量。
{"title":"Robust improvement of efficiency using information on covariate distribution.","authors":"Lu Mao","doi":"10.1214/24-ejs2311","DOIUrl":"10.1214/24-ejs2311","url":null,"abstract":"<p><p>The marginal inference of an outcome variable can be improved by closely related covariates with a structured distribution. This differs from standard covariate adjustment in randomized trials, which exploits covariate-treatment independence rather than knowledge on the covariate distribution. Yet it can also be done robustly against misspecification of the outcome-covariate relationship. Starting with a standard estimating function involving only the outcome, we first use a working regression model to compute its conditional expectation given the covariates, and then remove the uninformative part under the covariate distribution model. This effectively projects the initial function onto the joint tangent space of the full data, thereby achieving local efficiency when the regression model is correct. Importantly, even with a faulty working model, the estimator remains unbiased as the subtracted term is always asymptotically centered. Further improvement is possible if the outcome distribution also has its own structure. To demonstrate the process, we consider three examples: one with fully parametric covariates, one with a covariate following a partial parametric model against others, and another with mutually independent covariates.</p>","PeriodicalId":49272,"journal":{"name":"Electronic Journal of Statistics","volume":"18 2","pages":"4640-4666"},"PeriodicalIF":1.0,"publicationDate":"2024-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11633646/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142814607","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Should we estimate a product of density functions by a product of estimators? 我们应该用估计量的乘积来估计密度函数的乘积吗?
IF 1.1 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2023-01-01 DOI: 10.1214/23-ejs2103
F. Comte, C. Duval
{"title":"Should we estimate a product of density functions by a product of estimators?","authors":"F. Comte, C. Duval","doi":"10.1214/23-ejs2103","DOIUrl":"https://doi.org/10.1214/23-ejs2103","url":null,"abstract":"","PeriodicalId":49272,"journal":{"name":"Electronic Journal of Statistics","volume":" ","pages":""},"PeriodicalIF":1.1,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47988579","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Statistical inference via conditional Bayesian posteriors in high-dimensional linear regression 基于条件贝叶斯后验的高维线性回归统计推断
IF 1.1 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2023-01-01 DOI: 10.1214/23-ejs2113
Teng Wu, Naveen N. Narisetty, Yun Yang
{"title":"Statistical inference via conditional Bayesian posteriors in high-dimensional linear regression","authors":"Teng Wu, Naveen N. Narisetty, Yun Yang","doi":"10.1214/23-ejs2113","DOIUrl":"https://doi.org/10.1214/23-ejs2113","url":null,"abstract":"","PeriodicalId":49272,"journal":{"name":"Electronic Journal of Statistics","volume":"1 1","pages":""},"PeriodicalIF":1.1,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41453942","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Subnetwork estimation for spatial autoregressive models in large-scale networks 大规模网络中空间自回归模型的子网络估计
IF 1.1 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2023-01-01 DOI: 10.1214/23-ejs2139
Xuetong Li, Feifei Wang, Wei Lan, Hansheng Wang
Large-scale networks are commonly encountered in practice (e.g., Facebook and Twitter) by researchers. In order to study the network interaction between different nodes of large-scale networks, the spatial autoregressive (SAR) model has been popularly employed. Despite its popularity, the estimation of a SAR model on large-scale networks remains very challenging. On the one hand, due to policy limitations or high collection costs, it is often impossible for independent researchers to observe or collect all network information. On the other hand, even if the entire network is accessible, estimating the SAR model using the quasi-maximum likelihood estimator (QMLE) could be computationally infeasible due to its high computational cost. To address these challenges, we propose here a subnetwork estimation method based on QMLE for the SAR model. By using appropriate sampling methods, a subnetwork, consisting of a much-reduced number of nodes, can be constructed. Subsequently, the standard QMLE can be computed by treating the sampled subnetwork as if it were the entire network. This leads to a significant reduction in information collection and model computation costs, which increases the practical feasibility of the effort. Theoretically, we show that the subnetwork-based QMLE is consistent and asymptotically normal under appropriate regularity conditions. Extensive simulation studies, based on both simulated and real network structures, are presented.
研究人员在实践中经常遇到大规模网络(例如Facebook和Twitter)。为了研究大尺度网络中不同节点之间的网络相互作用,空间自回归(SAR)模型得到了广泛的应用。尽管SAR模型很受欢迎,但在大规模网络上估计SAR模型仍然非常具有挑战性。一方面,由于政策限制或较高的收集成本,独立的研究人员往往不可能观察或收集到所有的网络信息。另一方面,即使整个网络是可访问的,使用拟极大似然估计器(QMLE)估计SAR模型也可能由于其高计算成本而在计算上不可行的。为了解决这些问题,我们提出了一种基于QMLE的SAR模型子网络估计方法。通过使用适当的采样方法,可以构造一个由大量减少的节点组成的子网。随后,可以通过将采样的子网络视为整个网络来计算标准QMLE。这大大减少了信息收集和模型计算成本,从而增加了工作的实际可行性。理论上,我们证明了在适当的正则性条件下,基于子网络的QMLE是一致的和渐近正态的。广泛的仿真研究,基于模拟和真实的网络结构,提出。
{"title":"Subnetwork estimation for spatial autoregressive models in large-scale networks","authors":"Xuetong Li, Feifei Wang, Wei Lan, Hansheng Wang","doi":"10.1214/23-ejs2139","DOIUrl":"https://doi.org/10.1214/23-ejs2139","url":null,"abstract":"Large-scale networks are commonly encountered in practice (e.g., Facebook and Twitter) by researchers. In order to study the network interaction between different nodes of large-scale networks, the spatial autoregressive (SAR) model has been popularly employed. Despite its popularity, the estimation of a SAR model on large-scale networks remains very challenging. On the one hand, due to policy limitations or high collection costs, it is often impossible for independent researchers to observe or collect all network information. On the other hand, even if the entire network is accessible, estimating the SAR model using the quasi-maximum likelihood estimator (QMLE) could be computationally infeasible due to its high computational cost. To address these challenges, we propose here a subnetwork estimation method based on QMLE for the SAR model. By using appropriate sampling methods, a subnetwork, consisting of a much-reduced number of nodes, can be constructed. Subsequently, the standard QMLE can be computed by treating the sampled subnetwork as if it were the entire network. This leads to a significant reduction in information collection and model computation costs, which increases the practical feasibility of the effort. Theoretically, we show that the subnetwork-based QMLE is consistent and asymptotically normal under appropriate regularity conditions. Extensive simulation studies, based on both simulated and real network structures, are presented.","PeriodicalId":49272,"journal":{"name":"Electronic Journal of Statistics","volume":" ","pages":""},"PeriodicalIF":1.1,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42334033","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Variable selection for single-index varying-coefficients models with applications to synergistic G × E interactions 单指标变系数模型的变量选择及其在协同G×E相互作用中的应用
IF 1.1 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2023-01-01 DOI: 10.1214/23-ejs2117
Shunjie Guan, Mingtao Zhao, Yuehua Cui
{"title":"Variable selection for single-index varying-coefficients models with applications to synergistic G × E interactions","authors":"Shunjie Guan, Mingtao Zhao, Yuehua Cui","doi":"10.1214/23-ejs2117","DOIUrl":"https://doi.org/10.1214/23-ejs2117","url":null,"abstract":"","PeriodicalId":49272,"journal":{"name":"Electronic Journal of Statistics","volume":" ","pages":""},"PeriodicalIF":1.1,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42849077","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Bootstrap adjusted predictive classification for identification of subgroups with differential treatment effects under generalized linear models Bootstrap校正预测分类用于识别广义线性模型下具有差异治疗效果的亚组
IF 1.1 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2023-01-01 DOI: 10.1214/23-ejs2108
Na Li, Yanglei Song, C. D. Lin, D. Tu
{"title":"Bootstrap adjusted predictive classification for identification of subgroups with differential treatment effects under generalized linear models","authors":"Na Li, Yanglei Song, C. D. Lin, D. Tu","doi":"10.1214/23-ejs2108","DOIUrl":"https://doi.org/10.1214/23-ejs2108","url":null,"abstract":"","PeriodicalId":49272,"journal":{"name":"Electronic Journal of Statistics","volume":" ","pages":""},"PeriodicalIF":1.1,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42887010","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
On nonparametric estimation for cross-sectional sampled data under stationarity 平稳性下截面抽样数据的非参数估计
4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2023-01-01 DOI: 10.1214/23-ejs2163
Kwun Chuen Gary Chan, Hok Kan Ling, Sheung Chi Phillip Yam
{"title":"On nonparametric estimation for cross-sectional sampled data under stationarity","authors":"Kwun Chuen Gary Chan, Hok Kan Ling, Sheung Chi Phillip Yam","doi":"10.1214/23-ejs2163","DOIUrl":"https://doi.org/10.1214/23-ejs2163","url":null,"abstract":"","PeriodicalId":49272,"journal":{"name":"Electronic Journal of Statistics","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135508045","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Envelopes and principal component regression 包络和主成分回归
4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2023-01-01 DOI: 10.1214/23-ejs2154
Xin Zhang, Kai Deng, Qing Mai
Envelope methods offer targeted dimension reduction for various statistical models. The goal is to improve efficiency in multivariate parameter estimation by projecting the data onto a lower-dimensional subspace known as the envelope. Envelope approaches have advantages in analyzing data with highly correlated variables, but their iterative Grassmannian optimization algorithms do not scale very well with high-dimensional data. While the connections between envelopes and partial least squares in multivariate linear regression have promoted recent progress in high-dimensional studies of envelopes, we propose a more straightforward way of envelope modeling from a new principal component regression perspective. The proposed procedure, Non-Iterative Envelope Component Estimation (NIECE), has excellent computational advantages over the iterative Grassmannian optimization alternatives in high dimensions. We develop a unified theory that bridges the gap between envelope methods and principal components in regression. The new theoretical insights also shed light on the envelope subspace estimation error as a function of eigenvalue gaps of two symmetric positive definite matrices used in envelope modeling. We apply the new theory and algorithm to several envelope models, including response and predictor reduction in multivariate linear models, logistic regression, and Cox proportional hazard model. Simulations and illustrative data analysis show the potential for NIECE to improve standard methods in linear and generalized linear models significantly.
包络方法为各种统计模型提供了有针对性的降维。目标是通过将数据投影到称为包络的低维子空间来提高多变量参数估计的效率。包络方法在分析具有高度相关变量的数据时具有优势,但其迭代格拉斯曼优化算法在高维数据时不能很好地扩展。多元线性回归中包络和偏最小二乘之间的联系促进了高维包络研究的最新进展,我们从新的主成分回归角度提出了一种更直接的包络建模方法。所提出的非迭代包络分量估计(Non-Iterative Envelope Component Estimation,简称侄女)方法,在高维情况下比迭代格拉斯曼优化方法具有优异的计算优势。我们发展了一个统一的理论,弥合了回归中包络方法和主成分之间的差距。新的理论见解还揭示了包络子空间估计误差作为两个对称正定矩阵特征值间隙的函数用于包络建模。我们将新的理论和算法应用于几种包络模型,包括多元线性模型的响应和预测因子减少,逻辑回归和Cox比例风险模型。模拟和说明性数据分析表明,甥女在线性和广义线性模型中的标准方法有很大的改进潜力。
{"title":"Envelopes and principal component regression","authors":"Xin Zhang, Kai Deng, Qing Mai","doi":"10.1214/23-ejs2154","DOIUrl":"https://doi.org/10.1214/23-ejs2154","url":null,"abstract":"Envelope methods offer targeted dimension reduction for various statistical models. The goal is to improve efficiency in multivariate parameter estimation by projecting the data onto a lower-dimensional subspace known as the envelope. Envelope approaches have advantages in analyzing data with highly correlated variables, but their iterative Grassmannian optimization algorithms do not scale very well with high-dimensional data. While the connections between envelopes and partial least squares in multivariate linear regression have promoted recent progress in high-dimensional studies of envelopes, we propose a more straightforward way of envelope modeling from a new principal component regression perspective. The proposed procedure, Non-Iterative Envelope Component Estimation (NIECE), has excellent computational advantages over the iterative Grassmannian optimization alternatives in high dimensions. We develop a unified theory that bridges the gap between envelope methods and principal components in regression. The new theoretical insights also shed light on the envelope subspace estimation error as a function of eigenvalue gaps of two symmetric positive definite matrices used in envelope modeling. We apply the new theory and algorithm to several envelope models, including response and predictor reduction in multivariate linear models, logistic regression, and Cox proportional hazard model. Simulations and illustrative data analysis show the potential for NIECE to improve standard methods in linear and generalized linear models significantly.","PeriodicalId":49272,"journal":{"name":"Electronic Journal of Statistics","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136207137","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
期刊
Electronic Journal of Statistics
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1