首页 > 最新文献

Computational Statistics & Data Analysis最新文献

英文 中文
Regression analysis of elliptically symmetric directional data 椭圆对称方向数据的回归分析
IF 1.5 3区 数学 Q3 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-03-03 DOI: 10.1016/j.csda.2025.108167
Zehao Yu, Xianzheng Huang
A comprehensive toolkit is developed for regression analysis of directional data based on a flexible class of angular Gaussian distributions. Informative testing procedures to assess rotational symmetry around the mean direction, and the dependence of model parameters on covariates are proposed. Bootstrap-based algorithms are provided to assess the significance of the proposed test statistics. Moreover, a prediction region that achieves the smallest volume in a class of ellipsoidal prediction regions of the same coverage probability is constructed. The efficacy of these inference procedures is demonstrated in simulation experiments. Finally, this new toolkit is used to analyze directional data originating from a hydrology study and a bioinformatics application.
基于一类灵活的角高斯分布,开发了一个用于方向数据回归分析的综合工具包。提供信息的测试程序,以评估围绕平均方向的旋转对称性,以及模型参数对协变量的依赖性。提供了基于引导的算法来评估所提出的测试统计量的显著性。并构造了一类覆盖概率相同的椭球预测区域中体积最小的预测区域。仿真实验证明了这些推理程序的有效性。最后,这个新的工具包用于分析来自水文研究和生物信息学应用的定向数据。
{"title":"Regression analysis of elliptically symmetric directional data","authors":"Zehao Yu,&nbsp;Xianzheng Huang","doi":"10.1016/j.csda.2025.108167","DOIUrl":"10.1016/j.csda.2025.108167","url":null,"abstract":"<div><div>A comprehensive toolkit is developed for regression analysis of directional data based on a flexible class of angular Gaussian distributions. Informative testing procedures to assess rotational symmetry around the mean direction, and the dependence of model parameters on covariates are proposed. Bootstrap-based algorithms are provided to assess the significance of the proposed test statistics. Moreover, a prediction region that achieves the smallest volume in a class of ellipsoidal prediction regions of the same coverage probability is constructed. The efficacy of these inference procedures is demonstrated in simulation experiments. Finally, this new toolkit is used to analyze directional data originating from a hydrology study and a bioinformatics application.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"208 ","pages":"Article 108167"},"PeriodicalIF":1.5,"publicationDate":"2025-03-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143534290","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An accurate computational approach for partial likelihood using Poisson-binomial distributions 用泊松二项分布计算部分似然的精确方法
IF 1.5 3区 数学 Q3 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-02-24 DOI: 10.1016/j.csda.2025.108161
Youngjin Cho, Yili Hong, Pang Du
In a Cox model, the partial likelihood, as the product of a series of conditional probabilities, is used to estimate the regression coefficients. In practice, those conditional probabilities are approximated by risk score ratios based on a continuous time model, and thus result in parameter estimates from only an approximate partial likelihood. Through a revisit to the original partial likelihood idea, an accurate partial likelihood computing method for the Cox model is proposed, which calculates the exact conditional probability using the Poisson-binomial distribution. New estimating and inference procedures are developed, and theoretical results are established for the proposed computational procedure. Although ties are common in real studies, current theories for the Cox model mostly do not consider cases for tied data. In contrast, the new approach includes the theory for grouped data, which allows ties, and also includes the theory for continuous data without ties, providing a unified framework for computing partial likelihood for data with or without ties. Numerical results show that the proposed method outperforms current methods in reducing bias and mean squared error, while achieving improved confidence interval coverage rates, especially when there are many ties or when the variability in risk scores is large. Comparisons between methods in real applications have been made.
在Cox模型中,用部分似然作为一系列条件概率的乘积来估计回归系数。在实践中,这些条件概率是由基于连续时间模型的风险评分比率来逼近的,因此只能从近似的部分似然来估计参数。通过对原有部分似然思想的重新审视,提出了一种精确的Cox模型部分似然计算方法,该方法利用泊松二项分布计算精确的条件概率。开发了新的估计和推理程序,并为所提出的计算程序建立了理论结果。虽然关联在现实研究中很常见,但目前Cox模型的理论大多没有考虑关联数据的情况。相比之下,新方法包括了分组数据的理论,它允许联系,也包括了没有联系的连续数据的理论,为计算有或没有联系的数据的部分似然提供了一个统一的框架。数值结果表明,该方法在减少偏差和均方误差方面优于现有方法,同时提高了置信区间覆盖率,特别是当存在许多联系或风险评分的变异性较大时。并在实际应用中对几种方法进行了比较。
{"title":"An accurate computational approach for partial likelihood using Poisson-binomial distributions","authors":"Youngjin Cho,&nbsp;Yili Hong,&nbsp;Pang Du","doi":"10.1016/j.csda.2025.108161","DOIUrl":"10.1016/j.csda.2025.108161","url":null,"abstract":"<div><div>In a Cox model, the partial likelihood, as the product of a series of conditional probabilities, is used to estimate the regression coefficients. In practice, those conditional probabilities are approximated by risk score ratios based on a continuous time model, and thus result in parameter estimates from only an approximate partial likelihood. Through a revisit to the original partial likelihood idea, an accurate partial likelihood computing method for the Cox model is proposed, which calculates the exact conditional probability using the Poisson-binomial distribution. New estimating and inference procedures are developed, and theoretical results are established for the proposed computational procedure. Although ties are common in real studies, current theories for the Cox model mostly do not consider cases for tied data. In contrast, the new approach includes the theory for grouped data, which allows ties, and also includes the theory for continuous data without ties, providing a unified framework for computing partial likelihood for data with or without ties. Numerical results show that the proposed method outperforms current methods in reducing bias and mean squared error, while achieving improved confidence interval coverage rates, especially when there are many ties or when the variability in risk scores is large. Comparisons between methods in real applications have been made.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"208 ","pages":"Article 108161"},"PeriodicalIF":1.5,"publicationDate":"2025-02-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143519899","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Communication-efficient estimation and inference for high-dimensional longitudinal data 高维纵向数据的高效通信估计与推断
IF 1.5 3区 数学 Q3 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-02-24 DOI: 10.1016/j.csda.2025.108154
Xing Li, Yanjing Peng, Lei Wang
With the rapid growth in modern science and technology, distributed longitudinal data have drawn attention in a wide range of aspects. Realizing that not all effects of covariates are our parameters of interest, we focus on the distributed estimation and statistical inference of a pre-conceived low-dimensional parameter in the high-dimensional longitudinal GLMs with canonical links. To mitigate the impact of high-dimensional nuisance parameters and incorporate the within-subject correlation simultaneously, a decorrelated quadratic inference function is proposed for enhancing the estimation efficiency. Two communication-efficient surrogate decorrelated score estimators based on multi-round iterative algorithms are proposed. The error bounds and limiting distribution of the proposed estimators are established and extensive numerical experiments demonstrate the effectiveness of our method. An application to the National Longitudinal Survey of Youth Dataset is also presented.
随着现代科学技术的飞速发展,分布式纵向数据在各个方面引起了广泛的关注。意识到并非所有协变量的影响都是我们感兴趣的参数,我们将重点放在具有规范链接的高维纵向glm中预先设想的低维参数的分布估计和统计推断上。为了减轻高维干扰参数的影响,同时考虑主题内相关性,提出了一种去相关的二次推理函数来提高估计效率。提出了两种基于多轮迭代算法的高效通信代理去相关分数估计器。建立了所提估计量的误差边界和极限分布,并通过大量的数值实验证明了所提估计量的有效性。并介绍了在全国青年纵向调查数据集中的应用。
{"title":"Communication-efficient estimation and inference for high-dimensional longitudinal data","authors":"Xing Li,&nbsp;Yanjing Peng,&nbsp;Lei Wang","doi":"10.1016/j.csda.2025.108154","DOIUrl":"10.1016/j.csda.2025.108154","url":null,"abstract":"<div><div>With the rapid growth in modern science and technology, distributed longitudinal data have drawn attention in a wide range of aspects. Realizing that not all effects of covariates are our parameters of interest, we focus on the distributed estimation and statistical inference of a pre-conceived low-dimensional parameter in the high-dimensional longitudinal GLMs with canonical links. To mitigate the impact of high-dimensional nuisance parameters and incorporate the within-subject correlation simultaneously, a decorrelated quadratic inference function is proposed for enhancing the estimation efficiency. Two communication-efficient surrogate decorrelated score estimators based on multi-round iterative algorithms are proposed. The error bounds and limiting distribution of the proposed estimators are established and extensive numerical experiments demonstrate the effectiveness of our method. An application to the National Longitudinal Survey of Youth Dataset is also presented.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"208 ","pages":"Article 108154"},"PeriodicalIF":1.5,"publicationDate":"2025-02-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143479095","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Testing the constancy of the variance for time series with a trend 测试有趋势的时间序列方差的恒定性
IF 1.5 3区 数学 Q3 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-02-21 DOI: 10.1016/j.csda.2025.108147
Lei Jin , Li Cai , Suojin Wang
The assumption of constant variance is fundamental in numerous statistical procedures for time series analysis. Nonlinear time series may exhibit time-varying local conditional variance, even when they are globally homoscedastic. Two novel tests are proposed to assess the constancy of variance in time series with a possible time-varying mean trend. Unlike previous approaches, the new tests rely on Walsh transformations of squared processes after recentering the time series data. It is shown that the corresponding Walsh coefficients have desirable properties, such as asymptotic independence. Both a max-type statistic and an order selection statistic are developed, along with their asymptotic null distributions. Furthermore, the consistency of the proposed statistics under a sequence of local alternatives is established. An extensive simulation study is conducted to examine the finite-sample performance of the procedures in comparison with existing methodologies. The empirical results show that the proposed methods are more powerful in many situations while maintaining reasonable Type I error rates, especially for nonlinear time series. The proposed methods are applied to test the global homoscedasticity of a financial time series, a well log time series with a non-constant mean structure, and a vibration time series.
在时间序列分析的许多统计程序中,常方差假设是基本的。非线性时间序列即使是全局均方差的,也可能表现出时变的局部条件方差。提出了两种新的检验来评估具有可能时变平均趋势的时间序列的方差的稳定性。与以前的方法不同,新的测试依赖于重新输入时间序列数据后平方过程的沃尔什变换。结果表明,相应的Walsh系数具有渐近无关等良好的性质。给出了极大型统计量和阶数选择统计量,以及它们的渐近零分布。此外,本文还建立了所提出的统计量在局部替代序列下的一致性。进行了广泛的模拟研究,以检查程序的有限样本性能与现有的方法进行比较。实证结果表明,本文提出的方法在保持合理的I型错误率的情况下,在许多情况下都更强大,特别是对于非线性时间序列。将所提出的方法应用于金融时间序列、非常均值结构测井时间序列和振动时间序列的全局均方差检验。
{"title":"Testing the constancy of the variance for time series with a trend","authors":"Lei Jin ,&nbsp;Li Cai ,&nbsp;Suojin Wang","doi":"10.1016/j.csda.2025.108147","DOIUrl":"10.1016/j.csda.2025.108147","url":null,"abstract":"<div><div>The assumption of constant variance is fundamental in numerous statistical procedures for time series analysis. Nonlinear time series may exhibit time-varying local conditional variance, even when they are globally homoscedastic. Two novel tests are proposed to assess the constancy of variance in time series with a possible time-varying mean trend. Unlike previous approaches, the new tests rely on Walsh transformations of squared processes after recentering the time series data. It is shown that the corresponding Walsh coefficients have desirable properties, such as asymptotic independence. Both a max-type statistic and an order selection statistic are developed, along with their asymptotic null distributions. Furthermore, the consistency of the proposed statistics under a sequence of local alternatives is established. An extensive simulation study is conducted to examine the finite-sample performance of the procedures in comparison with existing methodologies. The empirical results show that the proposed methods are more powerful in many situations while maintaining reasonable Type I error rates, especially for nonlinear time series. The proposed methods are applied to test the global homoscedasticity of a financial time series, a well log time series with a non-constant mean structure, and a vibration time series.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"208 ","pages":"Article 108147"},"PeriodicalIF":1.5,"publicationDate":"2025-02-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143479096","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Exact designs for order-of-addition experiments under a transition-effect model 过渡效应模型下加阶实验的精确设计
IF 1.5 3区 数学 Q3 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-02-21 DOI: 10.1016/j.csda.2025.108162
Jiayi Zheng, Nicholas Rios
In the chemical, pharmaceutical, and food industries, sometimes the order of adding a set of components has an impact on the final product. These are instances of the Order-of-Addition (OofA) problem, which aims to find the optimal sequence of the components. Extensive research on this topic has been conducted, but almost all designs are found by optimizing the D−optimality criterion. However, when prediction of the response is important, there is still a need for I−optimal designs. Furthermore, designs are needed for experiments where some orders are infeasible due to constraints. A new model for OofA experiments is presented that uses transition effects to model the effect of order on the response. Three algorithms are proposed to find D− and I−efficient exact designs under this new model: Simulated Annealing, a metaheuristic algorithm, Bubble Sorting, a greedy local optimization algorithm, and the Greedy Randomized Adaptive Search Procedure (GRASP), another metaheuristic algorithm. These three algorithms are generalized to handle block constraints, where components are grouped into blocks with a fixed order. Finally, two examples are shown to illustrate the effectiveness of the proposed designs and models, even under block constraints.
在化学、制药和食品工业中,有时添加一组成分的顺序对最终产品有影响。这些是加法顺序问题(Order-of-Addition, OofA)的实例,其目的是找到组件的最优序列。对这一主题进行了广泛的研究,但几乎所有的设计都是通过优化D -最优性准则找到的。然而,当响应预测很重要时,仍然需要I -最优设计。此外,当实验中某些顺序由于约束而不可行时,需要进行设计。提出了一种新的OofA实验模型,利用跃迁效应来模拟顺序对响应的影响。在此模型下,提出了三种算法来寻找D -和I -有效的精确设计:模拟退火,一种元启发式算法,气泡排序,一种贪婪局部优化算法,以及贪婪随机自适应搜索过程(GRASP),另一种元启发式算法。这三种算法被推广到处理块约束,其中组件以固定的顺序分组到块中。最后,给出了两个例子来说明所提出的设计和模型在块约束下的有效性。
{"title":"Exact designs for order-of-addition experiments under a transition-effect model","authors":"Jiayi Zheng,&nbsp;Nicholas Rios","doi":"10.1016/j.csda.2025.108162","DOIUrl":"10.1016/j.csda.2025.108162","url":null,"abstract":"<div><div>In the chemical, pharmaceutical, and food industries, sometimes the order of adding a set of components has an impact on the final product. These are instances of the Order-of-Addition (OofA) problem, which aims to find the optimal sequence of the components. Extensive research on this topic has been conducted, but almost all designs are found by optimizing the <em>D</em>−optimality criterion. However, when prediction of the response is important, there is still a need for <em>I</em>−optimal designs. Furthermore, designs are needed for experiments where some orders are infeasible due to constraints. A new model for OofA experiments is presented that uses transition effects to model the effect of order on the response. Three algorithms are proposed to find <em>D</em>− and <em>I</em>−efficient exact designs under this new model: Simulated Annealing, a metaheuristic algorithm, Bubble Sorting, a greedy local optimization algorithm, and the Greedy Randomized Adaptive Search Procedure (GRASP), another metaheuristic algorithm. These three algorithms are generalized to handle block constraints, where components are grouped into blocks with a fixed order. Finally, two examples are shown to illustrate the effectiveness of the proposed designs and models, even under block constraints.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"208 ","pages":"Article 108162"},"PeriodicalIF":1.5,"publicationDate":"2025-02-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143534291","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Efficient sparse high-dimensional linear regression with a partitioned empirical Bayes ECM algorithm 基于分割经验贝叶斯ECM算法的高效稀疏高维线性回归
IF 1.5 3区 数学 Q3 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-02-10 DOI: 10.1016/j.csda.2025.108146
Alexander C. McLain , Anja Zgodic , Howard Bondell
Bayesian variable selection methods are powerful techniques for fitting sparse high-dimensional linear regression models. However, many are computationally intensive or require restrictive prior distributions on model parameters. A computationally efficient and powerful Bayesian approach is presented for sparse high-dimensional linear regression, requiring only minimal prior assumptions on parameters through plug-in empirical Bayes estimates of hyperparameters. The method employs a Parameter-Expanded Expectation-Conditional-Maximization (PX-ECM) algorithm to estimate maximum a posteriori (MAP) values of parameters via computationally efficient coordinate-wise optimization. The popular two-group approach to multiple testing motivates the E-step, resulting in a PaRtitiOned empirical Bayes Ecm (PROBE) algorithm for sparse high-dimensional linear regression. Both one-at-a-time and all-at-once optimization can be used to complete PROBE. Extensive simulation studies and analyses of cancer cell drug responses are conducted to compare PROBE's empirical properties with those of related methods. Implementation is available through the R package probe.
贝叶斯变量选择方法是拟合稀疏高维线性回归模型的有效方法。然而,许多模型需要大量的计算,或者需要模型参数的限制性先验分布。提出了一种计算效率高、功能强大的稀疏高维线性回归贝叶斯方法,该方法通过对超参数的插入式经验贝叶斯估计,只需要对参数进行最小的先验假设。该方法采用参数扩展期望条件最大化(PX-ECM)算法,通过计算效率高的坐标优化来估计参数的最大后验值(MAP)。流行的两组多重测试方法激发了E-step,产生了用于稀疏高维线性回归的PaRtitiOned empirical Bayes Ecm (PROBE)算法。一次性优化和一次性优化都可以用于完成PROBE。对癌细胞药物反应进行了广泛的模拟研究和分析,以比较PROBE的经验性质与相关方法的经验性质。通过R包探测可以实现。
{"title":"Efficient sparse high-dimensional linear regression with a partitioned empirical Bayes ECM algorithm","authors":"Alexander C. McLain ,&nbsp;Anja Zgodic ,&nbsp;Howard Bondell","doi":"10.1016/j.csda.2025.108146","DOIUrl":"10.1016/j.csda.2025.108146","url":null,"abstract":"<div><div>Bayesian variable selection methods are powerful techniques for fitting sparse high-dimensional linear regression models. However, many are computationally intensive or require restrictive prior distributions on model parameters. A computationally efficient and powerful Bayesian approach is presented for sparse high-dimensional linear regression, requiring only minimal prior assumptions on parameters through plug-in empirical Bayes estimates of hyperparameters. The method employs a Parameter-Expanded Expectation-Conditional-Maximization (PX-ECM) algorithm to estimate maximum <em>a posteriori</em> (MAP) values of parameters via computationally efficient coordinate-wise optimization. The popular two-group approach to multiple testing motivates the E-step, resulting in a PaRtitiOned empirical Bayes Ecm (PROBE) algorithm for sparse high-dimensional linear regression. Both one-at-a-time and all-at-once optimization can be used to complete PROBE. Extensive simulation studies and analyses of cancer cell drug responses are conducted to compare PROBE's empirical properties with those of related methods. Implementation is available through the R package <span>probe</span>.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"207 ","pages":"Article 108146"},"PeriodicalIF":1.5,"publicationDate":"2025-02-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143379333","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Differentially private estimation of weighted average treatment effects for binary outcomes 二元结果加权平均治疗效果的差分私有估计
IF 1.5 3区 数学 Q3 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-02-10 DOI: 10.1016/j.csda.2025.108145
Sharmistha Guha , Jerome P. Reiter
In the social and health sciences, researchers often make causal inferences using sensitive variables. These researchers, as well as the data holders themselves, may be ethically and perhaps legally obligated to protect the confidentiality of study participants' data. It is now known that releasing any statistics, including estimates of causal effects, computed with confidential data leaks information about the underlying data values. Thus, analysts may desire to use causal estimators that can provably bound this information leakage. Motivated by this goal, new algorithms are developed for estimating weighted average treatment effects with binary outcomes that satisfy the criterion of differential privacy. Theoretical results are presented on the accuracy of several differentially private estimators of weighted average treatment effects. Empirical evaluations using simulated data and a causal analysis involving education and income data illustrate the performance of these estimators.
在社会科学和健康科学中,研究人员经常使用敏感变量进行因果推断。这些研究人员,以及数据持有者自己,可能在道德上和法律上有义务保护研究参与者数据的机密性。现在已经知道,公布任何统计数据,包括用机密数据计算的因果效应估计,都会泄露有关基础数据值的信息。因此,分析人员可能希望使用可以证明限制此信息泄漏的因果估计器。在此目标的激励下,开发了新的算法来估计具有满足差分隐私准则的二元结果的加权平均处理效果。给出了几种加权平均处理效果的差分私有估计器的精度的理论结果。使用模拟数据的实证评估和涉及教育和收入数据的因果分析说明了这些估计器的性能。
{"title":"Differentially private estimation of weighted average treatment effects for binary outcomes","authors":"Sharmistha Guha ,&nbsp;Jerome P. Reiter","doi":"10.1016/j.csda.2025.108145","DOIUrl":"10.1016/j.csda.2025.108145","url":null,"abstract":"<div><div>In the social and health sciences, researchers often make causal inferences using sensitive variables. These researchers, as well as the data holders themselves, may be ethically and perhaps legally obligated to protect the confidentiality of study participants' data. It is now known that releasing any statistics, including estimates of causal effects, computed with confidential data leaks information about the underlying data values. Thus, analysts may desire to use causal estimators that can provably bound this information leakage. Motivated by this goal, new algorithms are developed for estimating weighted average treatment effects with binary outcomes that satisfy the criterion of differential privacy. Theoretical results are presented on the accuracy of several differentially private estimators of weighted average treatment effects. Empirical evaluations using simulated data and a causal analysis involving education and income data illustrate the performance of these estimators.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"207 ","pages":"Article 108145"},"PeriodicalIF":1.5,"publicationDate":"2025-02-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143395915","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Stratified distance space improves the efficiency of sequential samplers for approximate Bayesian computation 分层距离空间提高了序列采样器近似贝叶斯计算的效率
IF 1.5 3区 数学 Q3 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-01-27 DOI: 10.1016/j.csda.2025.108141
Henri Pesonen , Jukka Corander
Approximate Bayesian computation (ABC) methods are standard tools for inferring parameters of complex models when the likelihood function is analytically intractable. A popular approach to improving the poor acceptance rate of the basic rejection sampling ABC algorithm is to use sequential Monte Carlo (ABC SMC) to produce a sequence of proposal distributions adapting towards the posterior, instead of generating values from the prior distribution of the model parameters. Proposal distribution for the subsequent iteration is typically obtained from a weighted set of samples, often called particles, of the current iteration of this sequence. Current methods for constructing these proposal distributions treat all the particles equivalently, regardless of the corresponding value generated by the sampler, which may lead to inefficiency when propagating the information across iterations of the algorithm. To improve sampler efficiency, a modified approach called stratified distance ABC SMC is introduced. The algorithm stratifies particles based on their distance between the corresponding synthetic and observed data, and then constructs distinct proposal distributions for all the strata. Taking into account the distribution of distances across the particle space leads to substantially improved acceptance rate of the rejection sampling. It is shown that further efficiency could be gained by using a newly proposed stopping rule for the sequential process based on the stratified posterior samples and these advances are demonstrated by several examples.
近似贝叶斯计算(ABC)方法是在似然函数难以解析时推断复杂模型参数的标准工具。为了改善基本拒绝抽样ABC算法的低接受率,一种流行的方法是使用顺序蒙特卡罗(ABC SMC)来产生一个适应后验的建议分布序列,而不是从模型参数的先验分布中生成值。后续迭代的建议分布通常是从该序列当前迭代的加权样本集(通常称为粒子)中获得的。目前构建这些建议分布的方法等同地对待所有粒子,而不考虑采样器产生的相应值,这可能导致跨算法迭代传播信息时效率低下。为了提高采样效率,引入了一种改进的分层距离ABC - SMC方法。该算法根据合成数据与观测数据之间的距离对粒子进行分层,然后为所有层构建不同的建议分布。考虑粒子空间的距离分布,可大大提高拒绝抽样的接受率。结果表明,采用基于分层后验样本的新提出的顺序过程停止规则可以进一步提高效率,并通过几个实例证明了这些进展。
{"title":"Stratified distance space improves the efficiency of sequential samplers for approximate Bayesian computation","authors":"Henri Pesonen ,&nbsp;Jukka Corander","doi":"10.1016/j.csda.2025.108141","DOIUrl":"10.1016/j.csda.2025.108141","url":null,"abstract":"<div><div>Approximate Bayesian computation (ABC) methods are standard tools for inferring parameters of complex models when the likelihood function is analytically intractable. A popular approach to improving the poor acceptance rate of the basic rejection sampling ABC algorithm is to use sequential Monte Carlo (ABC SMC) to produce a sequence of proposal distributions adapting towards the posterior, instead of generating values from the prior distribution of the model parameters. Proposal distribution for the subsequent iteration is typically obtained from a weighted set of samples, often called particles, of the current iteration of this sequence. Current methods for constructing these proposal distributions treat all the particles equivalently, regardless of the corresponding value generated by the sampler, which may lead to inefficiency when propagating the information across iterations of the algorithm. To improve sampler efficiency, a modified approach called stratified distance ABC SMC is introduced. The algorithm stratifies particles based on their distance between the corresponding synthetic and observed data, and then constructs distinct proposal distributions for all the strata. Taking into account the distribution of distances across the particle space leads to substantially improved acceptance rate of the rejection sampling. It is shown that further efficiency could be gained by using a newly proposed stopping rule for the sequential process based on the stratified posterior samples and these advances are demonstrated by several examples.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"207 ","pages":"Article 108141"},"PeriodicalIF":1.5,"publicationDate":"2025-01-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143162050","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Confidence intervals for tree-structured varying coefficients 树状结构变系数的置信区间
IF 1.5 3区 数学 Q3 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-01-27 DOI: 10.1016/j.csda.2025.108142
Nikolai Spuck , Matthias Schmid , Malte Monin , Moritz Berger
The tree-structured varying coefficient (TSVC) model is a flexible regression approach that allows the effects of covariates to vary with the values of the effect modifiers. Relevant effect modifiers are identified inherently using recursive partitioning techniques. To quantify uncertainty in TSVC models, a procedure to construct confidence intervals of the estimated partition-specific coefficients is proposed. This task constitutes a selective inference problem as the coefficients of a TSVC model result from data-driven model building. To account for this issue, a parametric bootstrap approach, which is tailored to the complex structure of TSVC, is introduced. Finite sample properties, particularly coverage proportions, of the proposed confidence intervals are evaluated in a simulation study. For illustration, applications to data from COVID-19 patients and from patients suffering from acute odontogenic infection are considered. The proposed approach may also be adapted for constructing confidence intervals for other tree-based methods.
树结构变系数(TSVC)模型是一种灵活的回归方法,它允许协变量的影响随效应修饰符的值而变化。使用递归划分技术固有地识别相关的效果修饰符。为了量化TSVC模型中的不确定性,提出了一种构造估计的分区特定系数置信区间的方法。该任务构成了一个选择性推理问题,因为TSVC模型的系数是由数据驱动的模型构建产生的。为了解决这个问题,引入了一种适合TSVC复杂结构的参数自举方法。在模拟研究中评估了所提出的置信区间的有限样本性质,特别是覆盖比例。举例来说,我们考虑了对COVID-19患者和急性牙源性感染患者数据的应用。所提出的方法也适用于构造其他基于树的方法的置信区间。
{"title":"Confidence intervals for tree-structured varying coefficients","authors":"Nikolai Spuck ,&nbsp;Matthias Schmid ,&nbsp;Malte Monin ,&nbsp;Moritz Berger","doi":"10.1016/j.csda.2025.108142","DOIUrl":"10.1016/j.csda.2025.108142","url":null,"abstract":"<div><div>The tree-structured varying coefficient (TSVC) model is a flexible regression approach that allows the effects of covariates to vary with the values of the effect modifiers. Relevant effect modifiers are identified inherently using recursive partitioning techniques. To quantify uncertainty in TSVC models, a procedure to construct confidence intervals of the estimated partition-specific coefficients is proposed. This task constitutes a selective inference problem as the coefficients of a TSVC model result from data-driven model building. To account for this issue, a parametric bootstrap approach, which is tailored to the complex structure of TSVC, is introduced. Finite sample properties, particularly coverage proportions, of the proposed confidence intervals are evaluated in a simulation study. For illustration, applications to data from COVID-19 patients and from patients suffering from acute odontogenic infection are considered. The proposed approach may also be adapted for constructing confidence intervals for other tree-based methods.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"207 ","pages":"Article 108142"},"PeriodicalIF":1.5,"publicationDate":"2025-01-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143162450","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Efficient computation of sparse and robust maximum association estimators 稀疏鲁棒最大关联估计的高效计算
IF 1.5 3区 数学 Q3 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-01-26 DOI: 10.1016/j.csda.2025.108133
Pia Pfeiffer , Andreas Alfons , Peter Filzmoser
Robust statistical estimators offer resilience against outliers but are often computationally challenging, particularly in high-dimensional sparse settings. Modern optimization techniques are utilized for robust sparse association estimators without imposing constraints on the covariance structure. The approach splits the problem into a robust estimation phase, followed by optimization of a decoupled, biconvex problem to derive the sparse canonical vectors. An augmented Lagrangian algorithm, combined with a modified adaptive gradient descent method, induces sparsity through simultaneous updates of both canonical vectors. Results demonstrate improved precision over existing methods, with high-dimensional empirical examples illustrating the effectiveness of this approach. The methodology can also be extended to other robust sparse estimators.
鲁棒统计估计器提供了对异常值的弹性,但通常在计算上具有挑战性,特别是在高维稀疏设置中。现代优化技术被用于鲁棒稀疏关联估计,而不会对协方差结构施加约束。该方法将问题分解为一个鲁棒估计阶段,然后对解耦双凸问题进行优化以导出稀疏规范向量。增广拉格朗日算法结合改进的自适应梯度下降法,通过同时更新两个典型向量来诱导稀疏性。结果表明,与现有方法相比,精度有所提高,高维经验示例说明了该方法的有效性。该方法也可以推广到其他鲁棒稀疏估计。
{"title":"Efficient computation of sparse and robust maximum association estimators","authors":"Pia Pfeiffer ,&nbsp;Andreas Alfons ,&nbsp;Peter Filzmoser","doi":"10.1016/j.csda.2025.108133","DOIUrl":"10.1016/j.csda.2025.108133","url":null,"abstract":"<div><div>Robust statistical estimators offer resilience against outliers but are often computationally challenging, particularly in high-dimensional sparse settings. Modern optimization techniques are utilized for robust sparse association estimators without imposing constraints on the covariance structure. The approach splits the problem into a robust estimation phase, followed by optimization of a decoupled, biconvex problem to derive the sparse canonical vectors. An augmented Lagrangian algorithm, combined with a modified adaptive gradient descent method, induces sparsity through simultaneous updates of both canonical vectors. Results demonstrate improved precision over existing methods, with high-dimensional empirical examples illustrating the effectiveness of this approach. The methodology can also be extended to other robust sparse estimators.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"207 ","pages":"Article 108133"},"PeriodicalIF":1.5,"publicationDate":"2025-01-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143162051","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Computational Statistics & Data Analysis
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1