首页 > 最新文献

Electronic Journal of Statistics最新文献

英文 中文
Regression analysis of semiparametric Cox-Aalen transformation models with partly interval-censored data. 采用部分区间删失数据的半参数 Cox-Aalen 转换模型的回归分析。
IF 1.3 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2025-01-01 Epub Date: 2025-01-13 DOI: 10.1214/24-ejs2341
Xi Ninga, Yanqing Sun, Yinghao Pan, Peter B Gilbert

Partly interval-censored data, comprising exact and intervalcensored observations, are prevalent in biomedical, clinical, and epidemiological studies. This paper studies a flexible class of the semiparametric Cox-Aalen transformation models for regression analysis of such data. These models offer a versatile framework by accommodating both multiplicative and additive covariate effects and both constant and time-varying effects within a transformation, while also allowing for potentially time-dependent covariates. Moreover, this class of models includes many popular models such as the semiparametric transformation model, the Cox-Aalen model, the stratified Cox model, and the stratified proportional odds model as special cases. To facilitate efficient computation, we formulate a set of estimating equations and propose an Expectation-Solving (ES) algorithm that guarantees stability and rapid convergence. Under mild regularity assumptions, the resulting estimator is shown to be consistent and asymptotically normal. The validity of the weighted bootstrap is also established. A supremum test is proposed to test the time-varying covariate effects. Finally, the proposed method is evaluated through comprehensive simulations and applied to analyze data from a randomized HIV/AIDS trial.

在生物医学、临床和流行病学研究中,部分区间剔除数据,包括精确和区间剔除的观察结果。本文研究了一类灵活的半参数Cox-Aalen变换模型,用于这类数据的回归分析。这些模型提供了一个通用的框架,通过在转换中容纳乘法和加性协变量效应以及常数和时变效应,同时还允许潜在的时变协变量。此外,这类模型还包括许多流行的模型,如半参数变换模型、Cox- aalen模型、分层Cox模型和分层比例几率模型等。为了提高计算效率,我们建立了一组估计方程,并提出了一种保证稳定性和快速收敛的期望求解(ES)算法。在温和正则性假设下,得到的估计量是一致的和渐近正态的。验证了加权自举法的有效性。提出了一个最大检验来检验时变协变量效应。最后,通过综合模拟对该方法进行了评估,并应用于一项随机HIV/AIDS试验的数据分析。
{"title":"Regression analysis of semiparametric Cox-Aalen transformation models with partly interval-censored data.","authors":"Xi Ninga, Yanqing Sun, Yinghao Pan, Peter B Gilbert","doi":"10.1214/24-ejs2341","DOIUrl":"10.1214/24-ejs2341","url":null,"abstract":"<p><p>Partly interval-censored data, comprising exact and intervalcensored observations, are prevalent in biomedical, clinical, and epidemiological studies. This paper studies a flexible class of the semiparametric Cox-Aalen transformation models for regression analysis of such data. These models offer a versatile framework by accommodating both multiplicative and additive covariate effects and both constant and time-varying effects within a transformation, while also allowing for potentially time-dependent covariates. Moreover, this class of models includes many popular models such as the semiparametric transformation model, the Cox-Aalen model, the stratified Cox model, and the stratified proportional odds model as special cases. To facilitate efficient computation, we formulate a set of estimating equations and propose an Expectation-Solving (ES) algorithm that guarantees stability and rapid convergence. Under mild regularity assumptions, the resulting estimator is shown to be consistent and asymptotically normal. The validity of the weighted bootstrap is also established. A supremum test is proposed to test the time-varying covariate effects. Finally, the proposed method is evaluated through comprehensive simulations and applied to analyze data from a randomized HIV/AIDS trial.</p>","PeriodicalId":49272,"journal":{"name":"Electronic Journal of Statistics","volume":"19 1","pages":"240-290"},"PeriodicalIF":1.3,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11828658/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143442519","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Selective Inference for Sparse Graphs via Neighborhood Selection. 基于邻域选择的稀疏图选择推理。
IF 1.3 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2025-01-01 Epub Date: 2025-09-05 DOI: 10.1214/25-ejs2429
Yiling Huang, Snigdha Panigrahi, Walter Dempsey

Neighborhood selection is a widely used method used for estimating the support set of sparse precision matrices, which helps determine the conditional dependence structure in undirected graphical models. However, reporting only point estimates for the estimated graph can result in poor replicability without accompanying uncertainty estimates. In fields such as psychology, where the lack of replicability is a major concern, there is a growing need for methods that can address this issue. In this paper, we focus on the Gaussian graphical model. We introduce a selective inference method to attach uncertainty estimates to the selected (nonzero) entries of the precision matrix and decide which of the estimated edges must be included in the graph. Our method provides an exact adjustment for the selection of edges, which when multiplied with the Wishart density of the random matrix, results in valid selective inferences. Through the use of externally added randomization variables, our adjustment is easy to compute, requiring us to calculate the probability of a selection event, that is equivalent to a few sign constraints and that decouples across the nodewise regressions. Through simulations and an application to a mobile health trial designed to study mental health, we demonstrate that our selective inference method results in higher power and improved estimation accuracy.

邻域选择是一种广泛使用的稀疏精度矩阵支持集估计方法,它有助于确定无向图模型中的条件依赖结构。然而,仅报告估计图的点估计可能会导致较差的可复制性,而没有附带不确定性估计。在心理学等领域,缺乏可复制性是一个主要问题,人们越来越需要能够解决这一问题的方法。本文主要讨论高斯图模型。我们引入了一种选择性推理方法,将不确定性估计附加到精度矩阵的选定(非零)条目上,并决定哪些估计的边必须包含在图中。我们的方法为边缘的选择提供了一个精确的调整,当与随机矩阵的Wishart密度相乘时,结果是有效的选择推断。通过使用外部添加的随机化变量,我们的调整很容易计算,需要我们计算选择事件的概率,这相当于几个符号约束,并且在节点回归中解耦。通过模拟和应用于一个旨在研究心理健康的移动健康试验,我们证明了我们的选择推理方法具有更高的功率和更高的估计精度。
{"title":"Selective Inference for Sparse Graphs via Neighborhood Selection.","authors":"Yiling Huang, Snigdha Panigrahi, Walter Dempsey","doi":"10.1214/25-ejs2429","DOIUrl":"10.1214/25-ejs2429","url":null,"abstract":"<p><p>Neighborhood selection is a widely used method used for estimating the support set of sparse precision matrices, which helps determine the conditional dependence structure in undirected graphical models. However, reporting only point estimates for the estimated graph can result in poor replicability without accompanying uncertainty estimates. In fields such as psychology, where the lack of replicability is a major concern, there is a growing need for methods that can address this issue. In this paper, we focus on the Gaussian graphical model. We introduce a selective inference method to attach uncertainty estimates to the selected (nonzero) entries of the precision matrix and decide which of the estimated edges must be included in the graph. Our method provides an exact adjustment for the selection of edges, which when multiplied with the Wishart density of the random matrix, results in valid selective inferences. Through the use of externally added randomization variables, our adjustment is easy to compute, requiring us to calculate the probability of a selection event, that is equivalent to a few sign constraints and that decouples across the nodewise regressions. Through simulations and an application to a mobile health trial designed to study mental health, we demonstrate that our selective inference method results in higher power and improved estimation accuracy.</p>","PeriodicalId":49272,"journal":{"name":"Electronic Journal of Statistics","volume":"19 2","pages":"4083-4116"},"PeriodicalIF":1.3,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12631974/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145589764","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Direct Bayesian linear regression for distribution-valued covariates. 分布值协变量的直接贝叶斯线性回归。
IF 1 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2024-01-01 Epub Date: 2024-08-27 DOI: 10.1214/24-ejs2275
Bohao Tang, Sandipan Pramanik, Yi Zhao, Brian Caffo, Abhirup Datta

In this manuscript, we study scalar-on-distribution regression; that is, instances where subject-specific distributions or densities are the covariates, related to a scalar outcome via a regression model. In practice, only repeated measures are observed from those covariate distributions and common approaches first use these to estimate subject-specific density functions, which are then used as covariates in standard scalar-on-function regression. We propose a simple and direct method for linear scalar-on-distribution regression that circumvents the intermediate step of estimating subject-specific covariate densities. We show that one can directly use the observed repeated measures as covariates and endow the regression function with a Gaussian process prior to obtain a closed form or conjugate Bayesian inference. Our method subsumes the standard Bayesian non-parametric regression using Gaussian processes as a special case, corresponding to covariates being Dirac-distributions. The model is also invariant to any transformation or ordering of the repeated measures. Theoretically, we show that, despite only using the observed repeated measures from the true density-valued covariates that generated the data, the method can achieve an optimal estimation error bound of the regression function. The theory extends beyond i.i.d. settings to accommodate certain forms of within-subject dependence among the repeated measures. To our knowledge, this is the first theoretical study on Bayesian regression using distribution-valued covariates. We propose numerous extensions including a scalable implementation using low-rank Gaussian processes and a generalization to non-linear scalar-on-distribution regression. Through simulation studies, we demonstrate that our method performs substantially better than approaches that require an intermediate density estimation step especially with a small number of repeated measures per subject. We apply our method to study association of age with activity counts.

在本手稿中,我们研究的是标量-分布回归,即以特定受试者的分布或密度作为协变量,通过回归模型与标量结果相关联的情况。在实践中,只能从这些协变量分布中观察到重复测量值,常用的方法首先使用这些协变量分布来估计特定受试者的密度函数,然后在标准的标量-函数回归中将其用作协变量。我们提出了一种简单直接的线性标量-分布回归方法,该方法避开了估计特定受试者协变量密度这一中间步骤。我们证明,可以直接使用观测到的重复测量值作为协变量,并为回归函数赋予高斯过程先验,从而获得封闭形式或共轭贝叶斯推断。我们的方法将使用高斯过程的标准贝叶斯非参数回归归为特例,与协变量为狄拉克分布相对应。该模型还不受重复测量的任何变换或排序的影响。我们从理论上证明,尽管只使用了从产生数据的真实密度值协变量中观察到的重复测量值,该方法仍能实现回归函数的最优估计误差约束。该理论超越了 i.i.d.设置,以适应重复测量中某些形式的受试者内依赖性。据我们所知,这是首次对使用分布值协变量的贝叶斯回归进行理论研究。我们提出了许多扩展建议,包括使用低秩高斯过程的可扩展实现,以及对分布上非线性标量回归的概括。通过模拟研究,我们证明了我们的方法比那些需要中间密度估计步骤的方法要好得多,尤其是在每个研究对象重复测量次数较少的情况下。我们将我们的方法应用于研究年龄与活动计数的关联。
{"title":"Direct Bayesian linear regression for distribution-valued covariates.","authors":"Bohao Tang, Sandipan Pramanik, Yi Zhao, Brian Caffo, Abhirup Datta","doi":"10.1214/24-ejs2275","DOIUrl":"10.1214/24-ejs2275","url":null,"abstract":"<p><p>In this manuscript, we study scalar-on-distribution regression; that is, instances where subject-specific distributions or densities are the covariates, related to a scalar outcome via a regression model. In practice, only repeated measures are observed from those covariate distributions and common approaches first use these to estimate subject-specific density functions, which are then used as covariates in standard scalar-on-function regression. We propose a simple and direct method for linear scalar-on-distribution regression that circumvents the intermediate step of estimating subject-specific covariate densities. We show that one can directly use the observed repeated measures as covariates and endow the regression function with a Gaussian process prior to obtain a closed form or conjugate Bayesian inference. Our method subsumes the standard Bayesian non-parametric regression using Gaussian processes as a special case, corresponding to covariates being Dirac-distributions. The model is also invariant to any transformation or ordering of the repeated measures. Theoretically, we show that, despite only using the observed repeated measures from the true density-valued covariates that generated the data, the method can achieve an optimal estimation error bound of the regression function. The theory extends beyond i.i.d. settings to accommodate certain forms of within-subject dependence among the repeated measures. To our knowledge, this is the first theoretical study on Bayesian regression using distribution-valued covariates. We propose numerous extensions including a scalable implementation using low-rank Gaussian processes and a generalization to non-linear scalar-on-distribution regression. Through simulation studies, we demonstrate that our method performs substantially better than approaches that require an intermediate density estimation step especially with a small number of repeated measures per subject. We apply our method to study association of age with activity counts.</p>","PeriodicalId":49272,"journal":{"name":"Electronic Journal of Statistics","volume":"18 2","pages":"3327-3375"},"PeriodicalIF":1.0,"publicationDate":"2024-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11466299/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142401736","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Robust improvement of efficiency using information on covariate distribution. 利用协变量分布信息的鲁棒性效率改进。
IF 1 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2024-01-01 Epub Date: 2024-11-22 DOI: 10.1214/24-ejs2311
Lu Mao

The marginal inference of an outcome variable can be improved by closely related covariates with a structured distribution. This differs from standard covariate adjustment in randomized trials, which exploits covariate-treatment independence rather than knowledge on the covariate distribution. Yet it can also be done robustly against misspecification of the outcome-covariate relationship. Starting with a standard estimating function involving only the outcome, we first use a working regression model to compute its conditional expectation given the covariates, and then remove the uninformative part under the covariate distribution model. This effectively projects the initial function onto the joint tangent space of the full data, thereby achieving local efficiency when the regression model is correct. Importantly, even with a faulty working model, the estimator remains unbiased as the subtracted term is always asymptotically centered. Further improvement is possible if the outcome distribution also has its own structure. To demonstrate the process, we consider three examples: one with fully parametric covariates, one with a covariate following a partial parametric model against others, and another with mutually independent covariates.

结果变量的边际推断可以通过具有结构化分布的密切相关协变量来改进。这与随机试验中的标准协变量调整不同,随机试验利用协变量处理独立性,而不是对协变量分布的了解。然而,它也可以健壮地防止结果-协变量关系的错误说明。从一个只涉及结果的标准估计函数开始,我们首先使用一个工作回归模型来计算给定协变量的条件期望,然后在协变量分布模型下去除非信息部分。这有效地将初始函数投影到全数据的联合切线空间上,从而在回归模型正确的情况下实现局部效率。重要的是,即使有一个错误的工作模型,估计量仍然是无偏的,因为减去的项总是渐近中心。如果结果分布也有自己的结构,进一步的改进是可能的。为了演示这个过程,我们考虑了三个例子:一个是全参数协变量,一个是部分参数模型的协变量,另一个是相互独立的协变量。
{"title":"Robust improvement of efficiency using information on covariate distribution.","authors":"Lu Mao","doi":"10.1214/24-ejs2311","DOIUrl":"10.1214/24-ejs2311","url":null,"abstract":"<p><p>The marginal inference of an outcome variable can be improved by closely related covariates with a structured distribution. This differs from standard covariate adjustment in randomized trials, which exploits covariate-treatment independence rather than knowledge on the covariate distribution. Yet it can also be done robustly against misspecification of the outcome-covariate relationship. Starting with a standard estimating function involving only the outcome, we first use a working regression model to compute its conditional expectation given the covariates, and then remove the uninformative part under the covariate distribution model. This effectively projects the initial function onto the joint tangent space of the full data, thereby achieving local efficiency when the regression model is correct. Importantly, even with a faulty working model, the estimator remains unbiased as the subtracted term is always asymptotically centered. Further improvement is possible if the outcome distribution also has its own structure. To demonstrate the process, we consider three examples: one with fully parametric covariates, one with a covariate following a partial parametric model against others, and another with mutually independent covariates.</p>","PeriodicalId":49272,"journal":{"name":"Electronic Journal of Statistics","volume":"18 2","pages":"4640-4666"},"PeriodicalIF":1.0,"publicationDate":"2024-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11633646/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142814607","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Reproducible parameter inference using bagged posteriors. 使用袋装后验的可重复参数推理。
IF 1.3 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2024-01-01 Epub Date: 2024-03-28 DOI: 10.1214/24-ejs2237
Jonathan H Huggins, Jeffrey W Miller

Under model misspecification, it is known that Bayesian posteriors often do not properly quantify uncertainty about true or pseudo-true parameters. Even more fundamentally, misspecification leads to a lack of reproducibility in the sense that the same model will yield contradictory posteriors on independent data sets from the true distribution. To define a criterion for reproducible uncertainty quantification under misspecification, we consider the probability that two credible sets constructed from independent data sets have nonempty overlap, and we establish a lower bound on this overlap probability that holds whenever the credible sets are valid confidence sets. We prove that credible sets from the standard posterior can strongly violate this bound, indicating that it is not internally coherent under misspecification. To improve reproducibility in an easy-to-use and widely applicable way, we propose to apply bagging to the Bayesian posterior ("BayesBag"); that is, to use the average of posterior distributions conditioned on bootstrapped datasets. We motivate BayesBag from first principles based on Jeffrey conditionalization and show that the bagged posterior typically satisfies the overlap lower bound. Further, we prove a Bernstein-Von Mises theorem for the bagged posterior, establishing its asymptotic normal distribution. We demonstrate the benefits of BayesBag via simulation experiments and an application to crime rate prediction.

在模型不规范的情况下,贝叶斯后验通常不能正确地量化真或伪真参数的不确定性。更根本的是,错误的规范导致缺乏可重复性,即相同的模型将在真实分布的独立数据集上产生相互矛盾的后验。为了定义错误规范下可重复不确定性量化的准则,我们考虑由独立数据集构成的两个可信集存在非空重叠的概率,并建立了该重叠概率的下界,该下界在可信集为有效置信集时成立。我们证明了来自标准后验的可信集可以强烈地违反这个界,表明在错误说明下它不是内部相干的。为了以一种易于使用和广泛适用的方式提高再现性,我们建议对贝叶斯后验进行bagging(“BayesBag”);也就是说,使用基于自举数据集的后验分布的平均值。基于杰弗里条件化的BayesBag的第一原理,证明了袋装后验通常满足重叠下界。进一步证明了袋后验的Bernstein-Von Mises定理,建立了袋后验的渐近正态分布。我们通过仿真实验和在犯罪率预测中的应用证明了BayesBag的优点。
{"title":"Reproducible parameter inference using bagged posteriors.","authors":"Jonathan H Huggins, Jeffrey W Miller","doi":"10.1214/24-ejs2237","DOIUrl":"10.1214/24-ejs2237","url":null,"abstract":"<p><p>Under model misspecification, it is known that Bayesian posteriors often do not properly quantify uncertainty about true or pseudo-true parameters. Even more fundamentally, misspecification leads to a lack of reproducibility in the sense that the same model will yield contradictory posteriors on independent data sets from the true distribution. To define a criterion for reproducible uncertainty quantification under misspecification, we consider the probability that two credible sets constructed from independent data sets have nonempty overlap, and we establish a lower bound on this overlap probability that holds whenever the credible sets are valid confidence sets. We prove that credible sets from the standard posterior can strongly violate this bound, indicating that it is not internally coherent under misspecification. To improve reproducibility in an easy-to-use and widely applicable way, we propose to apply bagging to the Bayesian posterior (\"BayesBag\"); that is, to use the average of posterior distributions conditioned on bootstrapped datasets. We motivate BayesBag from first principles based on Jeffrey conditionalization and show that the bagged posterior typically satisfies the overlap lower bound. Further, we prove a Bernstein-Von Mises theorem for the bagged posterior, establishing its asymptotic normal distribution. We demonstrate the benefits of BayesBag via simulation experiments and an application to crime rate prediction.</p>","PeriodicalId":49272,"journal":{"name":"Electronic Journal of Statistics","volume":"18 1","pages":"1549-1585"},"PeriodicalIF":1.3,"publicationDate":"2024-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12588188/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145460251","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Should we estimate a product of density functions by a product of estimators? 我们应该用估计量的乘积来估计密度函数的乘积吗?
IF 1.1 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2023-01-01 DOI: 10.1214/23-ejs2103
F. Comte, C. Duval
{"title":"Should we estimate a product of density functions by a product of estimators?","authors":"F. Comte, C. Duval","doi":"10.1214/23-ejs2103","DOIUrl":"https://doi.org/10.1214/23-ejs2103","url":null,"abstract":"","PeriodicalId":49272,"journal":{"name":"Electronic Journal of Statistics","volume":" ","pages":""},"PeriodicalIF":1.1,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47988579","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Statistical inference via conditional Bayesian posteriors in high-dimensional linear regression 基于条件贝叶斯后验的高维线性回归统计推断
IF 1.1 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2023-01-01 DOI: 10.1214/23-ejs2113
Teng Wu, Naveen N. Narisetty, Yun Yang
{"title":"Statistical inference via conditional Bayesian posteriors in high-dimensional linear regression","authors":"Teng Wu, Naveen N. Narisetty, Yun Yang","doi":"10.1214/23-ejs2113","DOIUrl":"https://doi.org/10.1214/23-ejs2113","url":null,"abstract":"","PeriodicalId":49272,"journal":{"name":"Electronic Journal of Statistics","volume":"1 1","pages":""},"PeriodicalIF":1.1,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41453942","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Subnetwork estimation for spatial autoregressive models in large-scale networks 大规模网络中空间自回归模型的子网络估计
IF 1.1 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2023-01-01 DOI: 10.1214/23-ejs2139
Xuetong Li, Feifei Wang, Wei Lan, Hansheng Wang
Large-scale networks are commonly encountered in practice (e.g., Facebook and Twitter) by researchers. In order to study the network interaction between different nodes of large-scale networks, the spatial autoregressive (SAR) model has been popularly employed. Despite its popularity, the estimation of a SAR model on large-scale networks remains very challenging. On the one hand, due to policy limitations or high collection costs, it is often impossible for independent researchers to observe or collect all network information. On the other hand, even if the entire network is accessible, estimating the SAR model using the quasi-maximum likelihood estimator (QMLE) could be computationally infeasible due to its high computational cost. To address these challenges, we propose here a subnetwork estimation method based on QMLE for the SAR model. By using appropriate sampling methods, a subnetwork, consisting of a much-reduced number of nodes, can be constructed. Subsequently, the standard QMLE can be computed by treating the sampled subnetwork as if it were the entire network. This leads to a significant reduction in information collection and model computation costs, which increases the practical feasibility of the effort. Theoretically, we show that the subnetwork-based QMLE is consistent and asymptotically normal under appropriate regularity conditions. Extensive simulation studies, based on both simulated and real network structures, are presented.
研究人员在实践中经常遇到大规模网络(例如Facebook和Twitter)。为了研究大尺度网络中不同节点之间的网络相互作用,空间自回归(SAR)模型得到了广泛的应用。尽管SAR模型很受欢迎,但在大规模网络上估计SAR模型仍然非常具有挑战性。一方面,由于政策限制或较高的收集成本,独立的研究人员往往不可能观察或收集到所有的网络信息。另一方面,即使整个网络是可访问的,使用拟极大似然估计器(QMLE)估计SAR模型也可能由于其高计算成本而在计算上不可行的。为了解决这些问题,我们提出了一种基于QMLE的SAR模型子网络估计方法。通过使用适当的采样方法,可以构造一个由大量减少的节点组成的子网。随后,可以通过将采样的子网络视为整个网络来计算标准QMLE。这大大减少了信息收集和模型计算成本,从而增加了工作的实际可行性。理论上,我们证明了在适当的正则性条件下,基于子网络的QMLE是一致的和渐近正态的。广泛的仿真研究,基于模拟和真实的网络结构,提出。
{"title":"Subnetwork estimation for spatial autoregressive models in large-scale networks","authors":"Xuetong Li, Feifei Wang, Wei Lan, Hansheng Wang","doi":"10.1214/23-ejs2139","DOIUrl":"https://doi.org/10.1214/23-ejs2139","url":null,"abstract":"Large-scale networks are commonly encountered in practice (e.g., Facebook and Twitter) by researchers. In order to study the network interaction between different nodes of large-scale networks, the spatial autoregressive (SAR) model has been popularly employed. Despite its popularity, the estimation of a SAR model on large-scale networks remains very challenging. On the one hand, due to policy limitations or high collection costs, it is often impossible for independent researchers to observe or collect all network information. On the other hand, even if the entire network is accessible, estimating the SAR model using the quasi-maximum likelihood estimator (QMLE) could be computationally infeasible due to its high computational cost. To address these challenges, we propose here a subnetwork estimation method based on QMLE for the SAR model. By using appropriate sampling methods, a subnetwork, consisting of a much-reduced number of nodes, can be constructed. Subsequently, the standard QMLE can be computed by treating the sampled subnetwork as if it were the entire network. This leads to a significant reduction in information collection and model computation costs, which increases the practical feasibility of the effort. Theoretically, we show that the subnetwork-based QMLE is consistent and asymptotically normal under appropriate regularity conditions. Extensive simulation studies, based on both simulated and real network structures, are presented.","PeriodicalId":49272,"journal":{"name":"Electronic Journal of Statistics","volume":" ","pages":""},"PeriodicalIF":1.1,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42334033","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
On nonparametric estimation for cross-sectional sampled data under stationarity 平稳性下截面抽样数据的非参数估计
4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2023-01-01 DOI: 10.1214/23-ejs2163
Kwun Chuen Gary Chan, Hok Kan Ling, Sheung Chi Phillip Yam
{"title":"On nonparametric estimation for cross-sectional sampled data under stationarity","authors":"Kwun Chuen Gary Chan, Hok Kan Ling, Sheung Chi Phillip Yam","doi":"10.1214/23-ejs2163","DOIUrl":"https://doi.org/10.1214/23-ejs2163","url":null,"abstract":"","PeriodicalId":49272,"journal":{"name":"Electronic Journal of Statistics","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135508045","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Envelopes and principal component regression 包络和主成分回归
4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2023-01-01 DOI: 10.1214/23-ejs2154
Xin Zhang, Kai Deng, Qing Mai
Envelope methods offer targeted dimension reduction for various statistical models. The goal is to improve efficiency in multivariate parameter estimation by projecting the data onto a lower-dimensional subspace known as the envelope. Envelope approaches have advantages in analyzing data with highly correlated variables, but their iterative Grassmannian optimization algorithms do not scale very well with high-dimensional data. While the connections between envelopes and partial least squares in multivariate linear regression have promoted recent progress in high-dimensional studies of envelopes, we propose a more straightforward way of envelope modeling from a new principal component regression perspective. The proposed procedure, Non-Iterative Envelope Component Estimation (NIECE), has excellent computational advantages over the iterative Grassmannian optimization alternatives in high dimensions. We develop a unified theory that bridges the gap between envelope methods and principal components in regression. The new theoretical insights also shed light on the envelope subspace estimation error as a function of eigenvalue gaps of two symmetric positive definite matrices used in envelope modeling. We apply the new theory and algorithm to several envelope models, including response and predictor reduction in multivariate linear models, logistic regression, and Cox proportional hazard model. Simulations and illustrative data analysis show the potential for NIECE to improve standard methods in linear and generalized linear models significantly.
包络方法为各种统计模型提供了有针对性的降维。目标是通过将数据投影到称为包络的低维子空间来提高多变量参数估计的效率。包络方法在分析具有高度相关变量的数据时具有优势,但其迭代格拉斯曼优化算法在高维数据时不能很好地扩展。多元线性回归中包络和偏最小二乘之间的联系促进了高维包络研究的最新进展,我们从新的主成分回归角度提出了一种更直接的包络建模方法。所提出的非迭代包络分量估计(Non-Iterative Envelope Component Estimation,简称侄女)方法,在高维情况下比迭代格拉斯曼优化方法具有优异的计算优势。我们发展了一个统一的理论,弥合了回归中包络方法和主成分之间的差距。新的理论见解还揭示了包络子空间估计误差作为两个对称正定矩阵特征值间隙的函数用于包络建模。我们将新的理论和算法应用于几种包络模型,包括多元线性模型的响应和预测因子减少,逻辑回归和Cox比例风险模型。模拟和说明性数据分析表明,甥女在线性和广义线性模型中的标准方法有很大的改进潜力。
{"title":"Envelopes and principal component regression","authors":"Xin Zhang, Kai Deng, Qing Mai","doi":"10.1214/23-ejs2154","DOIUrl":"https://doi.org/10.1214/23-ejs2154","url":null,"abstract":"Envelope methods offer targeted dimension reduction for various statistical models. The goal is to improve efficiency in multivariate parameter estimation by projecting the data onto a lower-dimensional subspace known as the envelope. Envelope approaches have advantages in analyzing data with highly correlated variables, but their iterative Grassmannian optimization algorithms do not scale very well with high-dimensional data. While the connections between envelopes and partial least squares in multivariate linear regression have promoted recent progress in high-dimensional studies of envelopes, we propose a more straightforward way of envelope modeling from a new principal component regression perspective. The proposed procedure, Non-Iterative Envelope Component Estimation (NIECE), has excellent computational advantages over the iterative Grassmannian optimization alternatives in high dimensions. We develop a unified theory that bridges the gap between envelope methods and principal components in regression. The new theoretical insights also shed light on the envelope subspace estimation error as a function of eigenvalue gaps of two symmetric positive definite matrices used in envelope modeling. We apply the new theory and algorithm to several envelope models, including response and predictor reduction in multivariate linear models, logistic regression, and Cox proportional hazard model. Simulations and illustrative data analysis show the potential for NIECE to improve standard methods in linear and generalized linear models significantly.","PeriodicalId":49272,"journal":{"name":"Electronic Journal of Statistics","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136207137","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
期刊
Electronic Journal of Statistics
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1