首页 > 最新文献

Computational Statistics & Data Analysis最新文献

英文 中文
Regression analysis of elliptically symmetric directional data
IF 1.5 3区 数学 Q3 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-03-03 DOI: 10.1016/j.csda.2025.108167
Zehao Yu, Xianzheng Huang
A comprehensive toolkit is developed for regression analysis of directional data based on a flexible class of angular Gaussian distributions. Informative testing procedures to assess rotational symmetry around the mean direction, and the dependence of model parameters on covariates are proposed. Bootstrap-based algorithms are provided to assess the significance of the proposed test statistics. Moreover, a prediction region that achieves the smallest volume in a class of ellipsoidal prediction regions of the same coverage probability is constructed. The efficacy of these inference procedures is demonstrated in simulation experiments. Finally, this new toolkit is used to analyze directional data originating from a hydrology study and a bioinformatics application.
{"title":"Regression analysis of elliptically symmetric directional data","authors":"Zehao Yu,&nbsp;Xianzheng Huang","doi":"10.1016/j.csda.2025.108167","DOIUrl":"10.1016/j.csda.2025.108167","url":null,"abstract":"<div><div>A comprehensive toolkit is developed for regression analysis of directional data based on a flexible class of angular Gaussian distributions. Informative testing procedures to assess rotational symmetry around the mean direction, and the dependence of model parameters on covariates are proposed. Bootstrap-based algorithms are provided to assess the significance of the proposed test statistics. Moreover, a prediction region that achieves the smallest volume in a class of ellipsoidal prediction regions of the same coverage probability is constructed. The efficacy of these inference procedures is demonstrated in simulation experiments. Finally, this new toolkit is used to analyze directional data originating from a hydrology study and a bioinformatics application.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"208 ","pages":"Article 108167"},"PeriodicalIF":1.5,"publicationDate":"2025-03-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143534290","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An accurate computational approach for partial likelihood using Poisson-binomial distributions
IF 1.5 3区 数学 Q3 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-02-24 DOI: 10.1016/j.csda.2025.108161
Youngjin Cho, Yili Hong, Pang Du
In a Cox model, the partial likelihood, as the product of a series of conditional probabilities, is used to estimate the regression coefficients. In practice, those conditional probabilities are approximated by risk score ratios based on a continuous time model, and thus result in parameter estimates from only an approximate partial likelihood. Through a revisit to the original partial likelihood idea, an accurate partial likelihood computing method for the Cox model is proposed, which calculates the exact conditional probability using the Poisson-binomial distribution. New estimating and inference procedures are developed, and theoretical results are established for the proposed computational procedure. Although ties are common in real studies, current theories for the Cox model mostly do not consider cases for tied data. In contrast, the new approach includes the theory for grouped data, which allows ties, and also includes the theory for continuous data without ties, providing a unified framework for computing partial likelihood for data with or without ties. Numerical results show that the proposed method outperforms current methods in reducing bias and mean squared error, while achieving improved confidence interval coverage rates, especially when there are many ties or when the variability in risk scores is large. Comparisons between methods in real applications have been made.
{"title":"An accurate computational approach for partial likelihood using Poisson-binomial distributions","authors":"Youngjin Cho,&nbsp;Yili Hong,&nbsp;Pang Du","doi":"10.1016/j.csda.2025.108161","DOIUrl":"10.1016/j.csda.2025.108161","url":null,"abstract":"<div><div>In a Cox model, the partial likelihood, as the product of a series of conditional probabilities, is used to estimate the regression coefficients. In practice, those conditional probabilities are approximated by risk score ratios based on a continuous time model, and thus result in parameter estimates from only an approximate partial likelihood. Through a revisit to the original partial likelihood idea, an accurate partial likelihood computing method for the Cox model is proposed, which calculates the exact conditional probability using the Poisson-binomial distribution. New estimating and inference procedures are developed, and theoretical results are established for the proposed computational procedure. Although ties are common in real studies, current theories for the Cox model mostly do not consider cases for tied data. In contrast, the new approach includes the theory for grouped data, which allows ties, and also includes the theory for continuous data without ties, providing a unified framework for computing partial likelihood for data with or without ties. Numerical results show that the proposed method outperforms current methods in reducing bias and mean squared error, while achieving improved confidence interval coverage rates, especially when there are many ties or when the variability in risk scores is large. Comparisons between methods in real applications have been made.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"208 ","pages":"Article 108161"},"PeriodicalIF":1.5,"publicationDate":"2025-02-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143519899","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Communication-efficient estimation and inference for high-dimensional longitudinal data
IF 1.5 3区 数学 Q3 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-02-24 DOI: 10.1016/j.csda.2025.108154
Xing Li, Yanjing Peng, Lei Wang
With the rapid growth in modern science and technology, distributed longitudinal data have drawn attention in a wide range of aspects. Realizing that not all effects of covariates are our parameters of interest, we focus on the distributed estimation and statistical inference of a pre-conceived low-dimensional parameter in the high-dimensional longitudinal GLMs with canonical links. To mitigate the impact of high-dimensional nuisance parameters and incorporate the within-subject correlation simultaneously, a decorrelated quadratic inference function is proposed for enhancing the estimation efficiency. Two communication-efficient surrogate decorrelated score estimators based on multi-round iterative algorithms are proposed. The error bounds and limiting distribution of the proposed estimators are established and extensive numerical experiments demonstrate the effectiveness of our method. An application to the National Longitudinal Survey of Youth Dataset is also presented.
{"title":"Communication-efficient estimation and inference for high-dimensional longitudinal data","authors":"Xing Li,&nbsp;Yanjing Peng,&nbsp;Lei Wang","doi":"10.1016/j.csda.2025.108154","DOIUrl":"10.1016/j.csda.2025.108154","url":null,"abstract":"<div><div>With the rapid growth in modern science and technology, distributed longitudinal data have drawn attention in a wide range of aspects. Realizing that not all effects of covariates are our parameters of interest, we focus on the distributed estimation and statistical inference of a pre-conceived low-dimensional parameter in the high-dimensional longitudinal GLMs with canonical links. To mitigate the impact of high-dimensional nuisance parameters and incorporate the within-subject correlation simultaneously, a decorrelated quadratic inference function is proposed for enhancing the estimation efficiency. Two communication-efficient surrogate decorrelated score estimators based on multi-round iterative algorithms are proposed. The error bounds and limiting distribution of the proposed estimators are established and extensive numerical experiments demonstrate the effectiveness of our method. An application to the National Longitudinal Survey of Youth Dataset is also presented.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"208 ","pages":"Article 108154"},"PeriodicalIF":1.5,"publicationDate":"2025-02-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143479095","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Testing the constancy of the variance for time series with a trend 测试有趋势的时间序列方差的恒定性
IF 1.5 3区 数学 Q3 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-02-21 DOI: 10.1016/j.csda.2025.108147
Lei Jin , Li Cai , Suojin Wang
The assumption of constant variance is fundamental in numerous statistical procedures for time series analysis. Nonlinear time series may exhibit time-varying local conditional variance, even when they are globally homoscedastic. Two novel tests are proposed to assess the constancy of variance in time series with a possible time-varying mean trend. Unlike previous approaches, the new tests rely on Walsh transformations of squared processes after recentering the time series data. It is shown that the corresponding Walsh coefficients have desirable properties, such as asymptotic independence. Both a max-type statistic and an order selection statistic are developed, along with their asymptotic null distributions. Furthermore, the consistency of the proposed statistics under a sequence of local alternatives is established. An extensive simulation study is conducted to examine the finite-sample performance of the procedures in comparison with existing methodologies. The empirical results show that the proposed methods are more powerful in many situations while maintaining reasonable Type I error rates, especially for nonlinear time series. The proposed methods are applied to test the global homoscedasticity of a financial time series, a well log time series with a non-constant mean structure, and a vibration time series.
{"title":"Testing the constancy of the variance for time series with a trend","authors":"Lei Jin ,&nbsp;Li Cai ,&nbsp;Suojin Wang","doi":"10.1016/j.csda.2025.108147","DOIUrl":"10.1016/j.csda.2025.108147","url":null,"abstract":"<div><div>The assumption of constant variance is fundamental in numerous statistical procedures for time series analysis. Nonlinear time series may exhibit time-varying local conditional variance, even when they are globally homoscedastic. Two novel tests are proposed to assess the constancy of variance in time series with a possible time-varying mean trend. Unlike previous approaches, the new tests rely on Walsh transformations of squared processes after recentering the time series data. It is shown that the corresponding Walsh coefficients have desirable properties, such as asymptotic independence. Both a max-type statistic and an order selection statistic are developed, along with their asymptotic null distributions. Furthermore, the consistency of the proposed statistics under a sequence of local alternatives is established. An extensive simulation study is conducted to examine the finite-sample performance of the procedures in comparison with existing methodologies. The empirical results show that the proposed methods are more powerful in many situations while maintaining reasonable Type I error rates, especially for nonlinear time series. The proposed methods are applied to test the global homoscedasticity of a financial time series, a well log time series with a non-constant mean structure, and a vibration time series.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"208 ","pages":"Article 108147"},"PeriodicalIF":1.5,"publicationDate":"2025-02-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143479096","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Exact designs for order-of-addition experiments under a transition-effect model
IF 1.5 3区 数学 Q3 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-02-21 DOI: 10.1016/j.csda.2025.108162
Jiayi Zheng, Nicholas Rios
In the chemical, pharmaceutical, and food industries, sometimes the order of adding a set of components has an impact on the final product. These are instances of the Order-of-Addition (OofA) problem, which aims to find the optimal sequence of the components. Extensive research on this topic has been conducted, but almost all designs are found by optimizing the D−optimality criterion. However, when prediction of the response is important, there is still a need for I−optimal designs. Furthermore, designs are needed for experiments where some orders are infeasible due to constraints. A new model for OofA experiments is presented that uses transition effects to model the effect of order on the response. Three algorithms are proposed to find D− and I−efficient exact designs under this new model: Simulated Annealing, a metaheuristic algorithm, Bubble Sorting, a greedy local optimization algorithm, and the Greedy Randomized Adaptive Search Procedure (GRASP), another metaheuristic algorithm. These three algorithms are generalized to handle block constraints, where components are grouped into blocks with a fixed order. Finally, two examples are shown to illustrate the effectiveness of the proposed designs and models, even under block constraints.
{"title":"Exact designs for order-of-addition experiments under a transition-effect model","authors":"Jiayi Zheng,&nbsp;Nicholas Rios","doi":"10.1016/j.csda.2025.108162","DOIUrl":"10.1016/j.csda.2025.108162","url":null,"abstract":"<div><div>In the chemical, pharmaceutical, and food industries, sometimes the order of adding a set of components has an impact on the final product. These are instances of the Order-of-Addition (OofA) problem, which aims to find the optimal sequence of the components. Extensive research on this topic has been conducted, but almost all designs are found by optimizing the <em>D</em>−optimality criterion. However, when prediction of the response is important, there is still a need for <em>I</em>−optimal designs. Furthermore, designs are needed for experiments where some orders are infeasible due to constraints. A new model for OofA experiments is presented that uses transition effects to model the effect of order on the response. Three algorithms are proposed to find <em>D</em>− and <em>I</em>−efficient exact designs under this new model: Simulated Annealing, a metaheuristic algorithm, Bubble Sorting, a greedy local optimization algorithm, and the Greedy Randomized Adaptive Search Procedure (GRASP), another metaheuristic algorithm. These three algorithms are generalized to handle block constraints, where components are grouped into blocks with a fixed order. Finally, two examples are shown to illustrate the effectiveness of the proposed designs and models, even under block constraints.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"208 ","pages":"Article 108162"},"PeriodicalIF":1.5,"publicationDate":"2025-02-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143534291","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Efficient sparse high-dimensional linear regression with a partitioned empirical Bayes ECM algorithm
IF 1.5 3区 数学 Q3 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-02-10 DOI: 10.1016/j.csda.2025.108146
Alexander C. McLain , Anja Zgodic , Howard Bondell
Bayesian variable selection methods are powerful techniques for fitting sparse high-dimensional linear regression models. However, many are computationally intensive or require restrictive prior distributions on model parameters. A computationally efficient and powerful Bayesian approach is presented for sparse high-dimensional linear regression, requiring only minimal prior assumptions on parameters through plug-in empirical Bayes estimates of hyperparameters. The method employs a Parameter-Expanded Expectation-Conditional-Maximization (PX-ECM) algorithm to estimate maximum a posteriori (MAP) values of parameters via computationally efficient coordinate-wise optimization. The popular two-group approach to multiple testing motivates the E-step, resulting in a PaRtitiOned empirical Bayes Ecm (PROBE) algorithm for sparse high-dimensional linear regression. Both one-at-a-time and all-at-once optimization can be used to complete PROBE. Extensive simulation studies and analyses of cancer cell drug responses are conducted to compare PROBE's empirical properties with those of related methods. Implementation is available through the R package probe.
{"title":"Efficient sparse high-dimensional linear regression with a partitioned empirical Bayes ECM algorithm","authors":"Alexander C. McLain ,&nbsp;Anja Zgodic ,&nbsp;Howard Bondell","doi":"10.1016/j.csda.2025.108146","DOIUrl":"10.1016/j.csda.2025.108146","url":null,"abstract":"<div><div>Bayesian variable selection methods are powerful techniques for fitting sparse high-dimensional linear regression models. However, many are computationally intensive or require restrictive prior distributions on model parameters. A computationally efficient and powerful Bayesian approach is presented for sparse high-dimensional linear regression, requiring only minimal prior assumptions on parameters through plug-in empirical Bayes estimates of hyperparameters. The method employs a Parameter-Expanded Expectation-Conditional-Maximization (PX-ECM) algorithm to estimate maximum <em>a posteriori</em> (MAP) values of parameters via computationally efficient coordinate-wise optimization. The popular two-group approach to multiple testing motivates the E-step, resulting in a PaRtitiOned empirical Bayes Ecm (PROBE) algorithm for sparse high-dimensional linear regression. Both one-at-a-time and all-at-once optimization can be used to complete PROBE. Extensive simulation studies and analyses of cancer cell drug responses are conducted to compare PROBE's empirical properties with those of related methods. Implementation is available through the R package <span>probe</span>.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"207 ","pages":"Article 108146"},"PeriodicalIF":1.5,"publicationDate":"2025-02-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143379333","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Differentially private estimation of weighted average treatment effects for binary outcomes
IF 1.5 3区 数学 Q3 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-02-10 DOI: 10.1016/j.csda.2025.108145
Sharmistha Guha , Jerome P. Reiter
In the social and health sciences, researchers often make causal inferences using sensitive variables. These researchers, as well as the data holders themselves, may be ethically and perhaps legally obligated to protect the confidentiality of study participants' data. It is now known that releasing any statistics, including estimates of causal effects, computed with confidential data leaks information about the underlying data values. Thus, analysts may desire to use causal estimators that can provably bound this information leakage. Motivated by this goal, new algorithms are developed for estimating weighted average treatment effects with binary outcomes that satisfy the criterion of differential privacy. Theoretical results are presented on the accuracy of several differentially private estimators of weighted average treatment effects. Empirical evaluations using simulated data and a causal analysis involving education and income data illustrate the performance of these estimators.
{"title":"Differentially private estimation of weighted average treatment effects for binary outcomes","authors":"Sharmistha Guha ,&nbsp;Jerome P. Reiter","doi":"10.1016/j.csda.2025.108145","DOIUrl":"10.1016/j.csda.2025.108145","url":null,"abstract":"<div><div>In the social and health sciences, researchers often make causal inferences using sensitive variables. These researchers, as well as the data holders themselves, may be ethically and perhaps legally obligated to protect the confidentiality of study participants' data. It is now known that releasing any statistics, including estimates of causal effects, computed with confidential data leaks information about the underlying data values. Thus, analysts may desire to use causal estimators that can provably bound this information leakage. Motivated by this goal, new algorithms are developed for estimating weighted average treatment effects with binary outcomes that satisfy the criterion of differential privacy. Theoretical results are presented on the accuracy of several differentially private estimators of weighted average treatment effects. Empirical evaluations using simulated data and a causal analysis involving education and income data illustrate the performance of these estimators.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"207 ","pages":"Article 108145"},"PeriodicalIF":1.5,"publicationDate":"2025-02-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143395915","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Stratified distance space improves the efficiency of sequential samplers for approximate Bayesian computation
IF 1.5 3区 数学 Q3 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-01-27 DOI: 10.1016/j.csda.2025.108141
Henri Pesonen , Jukka Corander
Approximate Bayesian computation (ABC) methods are standard tools for inferring parameters of complex models when the likelihood function is analytically intractable. A popular approach to improving the poor acceptance rate of the basic rejection sampling ABC algorithm is to use sequential Monte Carlo (ABC SMC) to produce a sequence of proposal distributions adapting towards the posterior, instead of generating values from the prior distribution of the model parameters. Proposal distribution for the subsequent iteration is typically obtained from a weighted set of samples, often called particles, of the current iteration of this sequence. Current methods for constructing these proposal distributions treat all the particles equivalently, regardless of the corresponding value generated by the sampler, which may lead to inefficiency when propagating the information across iterations of the algorithm. To improve sampler efficiency, a modified approach called stratified distance ABC SMC is introduced. The algorithm stratifies particles based on their distance between the corresponding synthetic and observed data, and then constructs distinct proposal distributions for all the strata. Taking into account the distribution of distances across the particle space leads to substantially improved acceptance rate of the rejection sampling. It is shown that further efficiency could be gained by using a newly proposed stopping rule for the sequential process based on the stratified posterior samples and these advances are demonstrated by several examples.
{"title":"Stratified distance space improves the efficiency of sequential samplers for approximate Bayesian computation","authors":"Henri Pesonen ,&nbsp;Jukka Corander","doi":"10.1016/j.csda.2025.108141","DOIUrl":"10.1016/j.csda.2025.108141","url":null,"abstract":"<div><div>Approximate Bayesian computation (ABC) methods are standard tools for inferring parameters of complex models when the likelihood function is analytically intractable. A popular approach to improving the poor acceptance rate of the basic rejection sampling ABC algorithm is to use sequential Monte Carlo (ABC SMC) to produce a sequence of proposal distributions adapting towards the posterior, instead of generating values from the prior distribution of the model parameters. Proposal distribution for the subsequent iteration is typically obtained from a weighted set of samples, often called particles, of the current iteration of this sequence. Current methods for constructing these proposal distributions treat all the particles equivalently, regardless of the corresponding value generated by the sampler, which may lead to inefficiency when propagating the information across iterations of the algorithm. To improve sampler efficiency, a modified approach called stratified distance ABC SMC is introduced. The algorithm stratifies particles based on their distance between the corresponding synthetic and observed data, and then constructs distinct proposal distributions for all the strata. Taking into account the distribution of distances across the particle space leads to substantially improved acceptance rate of the rejection sampling. It is shown that further efficiency could be gained by using a newly proposed stopping rule for the sequential process based on the stratified posterior samples and these advances are demonstrated by several examples.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"207 ","pages":"Article 108141"},"PeriodicalIF":1.5,"publicationDate":"2025-01-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143162050","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Confidence intervals for tree-structured varying coefficients
IF 1.5 3区 数学 Q3 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-01-27 DOI: 10.1016/j.csda.2025.108142
Nikolai Spuck , Matthias Schmid , Malte Monin , Moritz Berger
The tree-structured varying coefficient (TSVC) model is a flexible regression approach that allows the effects of covariates to vary with the values of the effect modifiers. Relevant effect modifiers are identified inherently using recursive partitioning techniques. To quantify uncertainty in TSVC models, a procedure to construct confidence intervals of the estimated partition-specific coefficients is proposed. This task constitutes a selective inference problem as the coefficients of a TSVC model result from data-driven model building. To account for this issue, a parametric bootstrap approach, which is tailored to the complex structure of TSVC, is introduced. Finite sample properties, particularly coverage proportions, of the proposed confidence intervals are evaluated in a simulation study. For illustration, applications to data from COVID-19 patients and from patients suffering from acute odontogenic infection are considered. The proposed approach may also be adapted for constructing confidence intervals for other tree-based methods.
{"title":"Confidence intervals for tree-structured varying coefficients","authors":"Nikolai Spuck ,&nbsp;Matthias Schmid ,&nbsp;Malte Monin ,&nbsp;Moritz Berger","doi":"10.1016/j.csda.2025.108142","DOIUrl":"10.1016/j.csda.2025.108142","url":null,"abstract":"<div><div>The tree-structured varying coefficient (TSVC) model is a flexible regression approach that allows the effects of covariates to vary with the values of the effect modifiers. Relevant effect modifiers are identified inherently using recursive partitioning techniques. To quantify uncertainty in TSVC models, a procedure to construct confidence intervals of the estimated partition-specific coefficients is proposed. This task constitutes a selective inference problem as the coefficients of a TSVC model result from data-driven model building. To account for this issue, a parametric bootstrap approach, which is tailored to the complex structure of TSVC, is introduced. Finite sample properties, particularly coverage proportions, of the proposed confidence intervals are evaluated in a simulation study. For illustration, applications to data from COVID-19 patients and from patients suffering from acute odontogenic infection are considered. The proposed approach may also be adapted for constructing confidence intervals for other tree-based methods.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"207 ","pages":"Article 108142"},"PeriodicalIF":1.5,"publicationDate":"2025-01-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143162450","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Efficient computation of sparse and robust maximum association estimators
IF 1.5 3区 数学 Q3 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-01-26 DOI: 10.1016/j.csda.2025.108133
Pia Pfeiffer , Andreas Alfons , Peter Filzmoser
Robust statistical estimators offer resilience against outliers but are often computationally challenging, particularly in high-dimensional sparse settings. Modern optimization techniques are utilized for robust sparse association estimators without imposing constraints on the covariance structure. The approach splits the problem into a robust estimation phase, followed by optimization of a decoupled, biconvex problem to derive the sparse canonical vectors. An augmented Lagrangian algorithm, combined with a modified adaptive gradient descent method, induces sparsity through simultaneous updates of both canonical vectors. Results demonstrate improved precision over existing methods, with high-dimensional empirical examples illustrating the effectiveness of this approach. The methodology can also be extended to other robust sparse estimators.
{"title":"Efficient computation of sparse and robust maximum association estimators","authors":"Pia Pfeiffer ,&nbsp;Andreas Alfons ,&nbsp;Peter Filzmoser","doi":"10.1016/j.csda.2025.108133","DOIUrl":"10.1016/j.csda.2025.108133","url":null,"abstract":"<div><div>Robust statistical estimators offer resilience against outliers but are often computationally challenging, particularly in high-dimensional sparse settings. Modern optimization techniques are utilized for robust sparse association estimators without imposing constraints on the covariance structure. The approach splits the problem into a robust estimation phase, followed by optimization of a decoupled, biconvex problem to derive the sparse canonical vectors. An augmented Lagrangian algorithm, combined with a modified adaptive gradient descent method, induces sparsity through simultaneous updates of both canonical vectors. Results demonstrate improved precision over existing methods, with high-dimensional empirical examples illustrating the effectiveness of this approach. The methodology can also be extended to other robust sparse estimators.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"207 ","pages":"Article 108133"},"PeriodicalIF":1.5,"publicationDate":"2025-01-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143162051","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Computational Statistics & Data Analysis
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1