Pub Date : 2025-03-03DOI: 10.1016/j.csda.2025.108167
Zehao Yu, Xianzheng Huang
A comprehensive toolkit is developed for regression analysis of directional data based on a flexible class of angular Gaussian distributions. Informative testing procedures to assess rotational symmetry around the mean direction, and the dependence of model parameters on covariates are proposed. Bootstrap-based algorithms are provided to assess the significance of the proposed test statistics. Moreover, a prediction region that achieves the smallest volume in a class of ellipsoidal prediction regions of the same coverage probability is constructed. The efficacy of these inference procedures is demonstrated in simulation experiments. Finally, this new toolkit is used to analyze directional data originating from a hydrology study and a bioinformatics application.
{"title":"Regression analysis of elliptically symmetric directional data","authors":"Zehao Yu, Xianzheng Huang","doi":"10.1016/j.csda.2025.108167","DOIUrl":"10.1016/j.csda.2025.108167","url":null,"abstract":"<div><div>A comprehensive toolkit is developed for regression analysis of directional data based on a flexible class of angular Gaussian distributions. Informative testing procedures to assess rotational symmetry around the mean direction, and the dependence of model parameters on covariates are proposed. Bootstrap-based algorithms are provided to assess the significance of the proposed test statistics. Moreover, a prediction region that achieves the smallest volume in a class of ellipsoidal prediction regions of the same coverage probability is constructed. The efficacy of these inference procedures is demonstrated in simulation experiments. Finally, this new toolkit is used to analyze directional data originating from a hydrology study and a bioinformatics application.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"208 ","pages":"Article 108167"},"PeriodicalIF":1.5,"publicationDate":"2025-03-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143534290","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-02-24DOI: 10.1016/j.csda.2025.108161
Youngjin Cho, Yili Hong, Pang Du
In a Cox model, the partial likelihood, as the product of a series of conditional probabilities, is used to estimate the regression coefficients. In practice, those conditional probabilities are approximated by risk score ratios based on a continuous time model, and thus result in parameter estimates from only an approximate partial likelihood. Through a revisit to the original partial likelihood idea, an accurate partial likelihood computing method for the Cox model is proposed, which calculates the exact conditional probability using the Poisson-binomial distribution. New estimating and inference procedures are developed, and theoretical results are established for the proposed computational procedure. Although ties are common in real studies, current theories for the Cox model mostly do not consider cases for tied data. In contrast, the new approach includes the theory for grouped data, which allows ties, and also includes the theory for continuous data without ties, providing a unified framework for computing partial likelihood for data with or without ties. Numerical results show that the proposed method outperforms current methods in reducing bias and mean squared error, while achieving improved confidence interval coverage rates, especially when there are many ties or when the variability in risk scores is large. Comparisons between methods in real applications have been made.
{"title":"An accurate computational approach for partial likelihood using Poisson-binomial distributions","authors":"Youngjin Cho, Yili Hong, Pang Du","doi":"10.1016/j.csda.2025.108161","DOIUrl":"10.1016/j.csda.2025.108161","url":null,"abstract":"<div><div>In a Cox model, the partial likelihood, as the product of a series of conditional probabilities, is used to estimate the regression coefficients. In practice, those conditional probabilities are approximated by risk score ratios based on a continuous time model, and thus result in parameter estimates from only an approximate partial likelihood. Through a revisit to the original partial likelihood idea, an accurate partial likelihood computing method for the Cox model is proposed, which calculates the exact conditional probability using the Poisson-binomial distribution. New estimating and inference procedures are developed, and theoretical results are established for the proposed computational procedure. Although ties are common in real studies, current theories for the Cox model mostly do not consider cases for tied data. In contrast, the new approach includes the theory for grouped data, which allows ties, and also includes the theory for continuous data without ties, providing a unified framework for computing partial likelihood for data with or without ties. Numerical results show that the proposed method outperforms current methods in reducing bias and mean squared error, while achieving improved confidence interval coverage rates, especially when there are many ties or when the variability in risk scores is large. Comparisons between methods in real applications have been made.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"208 ","pages":"Article 108161"},"PeriodicalIF":1.5,"publicationDate":"2025-02-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143519899","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-02-24DOI: 10.1016/j.csda.2025.108154
Xing Li, Yanjing Peng, Lei Wang
With the rapid growth in modern science and technology, distributed longitudinal data have drawn attention in a wide range of aspects. Realizing that not all effects of covariates are our parameters of interest, we focus on the distributed estimation and statistical inference of a pre-conceived low-dimensional parameter in the high-dimensional longitudinal GLMs with canonical links. To mitigate the impact of high-dimensional nuisance parameters and incorporate the within-subject correlation simultaneously, a decorrelated quadratic inference function is proposed for enhancing the estimation efficiency. Two communication-efficient surrogate decorrelated score estimators based on multi-round iterative algorithms are proposed. The error bounds and limiting distribution of the proposed estimators are established and extensive numerical experiments demonstrate the effectiveness of our method. An application to the National Longitudinal Survey of Youth Dataset is also presented.
{"title":"Communication-efficient estimation and inference for high-dimensional longitudinal data","authors":"Xing Li, Yanjing Peng, Lei Wang","doi":"10.1016/j.csda.2025.108154","DOIUrl":"10.1016/j.csda.2025.108154","url":null,"abstract":"<div><div>With the rapid growth in modern science and technology, distributed longitudinal data have drawn attention in a wide range of aspects. Realizing that not all effects of covariates are our parameters of interest, we focus on the distributed estimation and statistical inference of a pre-conceived low-dimensional parameter in the high-dimensional longitudinal GLMs with canonical links. To mitigate the impact of high-dimensional nuisance parameters and incorporate the within-subject correlation simultaneously, a decorrelated quadratic inference function is proposed for enhancing the estimation efficiency. Two communication-efficient surrogate decorrelated score estimators based on multi-round iterative algorithms are proposed. The error bounds and limiting distribution of the proposed estimators are established and extensive numerical experiments demonstrate the effectiveness of our method. An application to the National Longitudinal Survey of Youth Dataset is also presented.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"208 ","pages":"Article 108154"},"PeriodicalIF":1.5,"publicationDate":"2025-02-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143479095","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-02-21DOI: 10.1016/j.csda.2025.108147
Lei Jin , Li Cai , Suojin Wang
The assumption of constant variance is fundamental in numerous statistical procedures for time series analysis. Nonlinear time series may exhibit time-varying local conditional variance, even when they are globally homoscedastic. Two novel tests are proposed to assess the constancy of variance in time series with a possible time-varying mean trend. Unlike previous approaches, the new tests rely on Walsh transformations of squared processes after recentering the time series data. It is shown that the corresponding Walsh coefficients have desirable properties, such as asymptotic independence. Both a max-type statistic and an order selection statistic are developed, along with their asymptotic null distributions. Furthermore, the consistency of the proposed statistics under a sequence of local alternatives is established. An extensive simulation study is conducted to examine the finite-sample performance of the procedures in comparison with existing methodologies. The empirical results show that the proposed methods are more powerful in many situations while maintaining reasonable Type I error rates, especially for nonlinear time series. The proposed methods are applied to test the global homoscedasticity of a financial time series, a well log time series with a non-constant mean structure, and a vibration time series.
{"title":"Testing the constancy of the variance for time series with a trend","authors":"Lei Jin , Li Cai , Suojin Wang","doi":"10.1016/j.csda.2025.108147","DOIUrl":"10.1016/j.csda.2025.108147","url":null,"abstract":"<div><div>The assumption of constant variance is fundamental in numerous statistical procedures for time series analysis. Nonlinear time series may exhibit time-varying local conditional variance, even when they are globally homoscedastic. Two novel tests are proposed to assess the constancy of variance in time series with a possible time-varying mean trend. Unlike previous approaches, the new tests rely on Walsh transformations of squared processes after recentering the time series data. It is shown that the corresponding Walsh coefficients have desirable properties, such as asymptotic independence. Both a max-type statistic and an order selection statistic are developed, along with their asymptotic null distributions. Furthermore, the consistency of the proposed statistics under a sequence of local alternatives is established. An extensive simulation study is conducted to examine the finite-sample performance of the procedures in comparison with existing methodologies. The empirical results show that the proposed methods are more powerful in many situations while maintaining reasonable Type I error rates, especially for nonlinear time series. The proposed methods are applied to test the global homoscedasticity of a financial time series, a well log time series with a non-constant mean structure, and a vibration time series.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"208 ","pages":"Article 108147"},"PeriodicalIF":1.5,"publicationDate":"2025-02-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143479096","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-02-21DOI: 10.1016/j.csda.2025.108162
Jiayi Zheng, Nicholas Rios
In the chemical, pharmaceutical, and food industries, sometimes the order of adding a set of components has an impact on the final product. These are instances of the Order-of-Addition (OofA) problem, which aims to find the optimal sequence of the components. Extensive research on this topic has been conducted, but almost all designs are found by optimizing the D−optimality criterion. However, when prediction of the response is important, there is still a need for I−optimal designs. Furthermore, designs are needed for experiments where some orders are infeasible due to constraints. A new model for OofA experiments is presented that uses transition effects to model the effect of order on the response. Three algorithms are proposed to find D− and I−efficient exact designs under this new model: Simulated Annealing, a metaheuristic algorithm, Bubble Sorting, a greedy local optimization algorithm, and the Greedy Randomized Adaptive Search Procedure (GRASP), another metaheuristic algorithm. These three algorithms are generalized to handle block constraints, where components are grouped into blocks with a fixed order. Finally, two examples are shown to illustrate the effectiveness of the proposed designs and models, even under block constraints.
{"title":"Exact designs for order-of-addition experiments under a transition-effect model","authors":"Jiayi Zheng, Nicholas Rios","doi":"10.1016/j.csda.2025.108162","DOIUrl":"10.1016/j.csda.2025.108162","url":null,"abstract":"<div><div>In the chemical, pharmaceutical, and food industries, sometimes the order of adding a set of components has an impact on the final product. These are instances of the Order-of-Addition (OofA) problem, which aims to find the optimal sequence of the components. Extensive research on this topic has been conducted, but almost all designs are found by optimizing the <em>D</em>−optimality criterion. However, when prediction of the response is important, there is still a need for <em>I</em>−optimal designs. Furthermore, designs are needed for experiments where some orders are infeasible due to constraints. A new model for OofA experiments is presented that uses transition effects to model the effect of order on the response. Three algorithms are proposed to find <em>D</em>− and <em>I</em>−efficient exact designs under this new model: Simulated Annealing, a metaheuristic algorithm, Bubble Sorting, a greedy local optimization algorithm, and the Greedy Randomized Adaptive Search Procedure (GRASP), another metaheuristic algorithm. These three algorithms are generalized to handle block constraints, where components are grouped into blocks with a fixed order. Finally, two examples are shown to illustrate the effectiveness of the proposed designs and models, even under block constraints.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"208 ","pages":"Article 108162"},"PeriodicalIF":1.5,"publicationDate":"2025-02-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143534291","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-02-10DOI: 10.1016/j.csda.2025.108146
Alexander C. McLain , Anja Zgodic , Howard Bondell
Bayesian variable selection methods are powerful techniques for fitting sparse high-dimensional linear regression models. However, many are computationally intensive or require restrictive prior distributions on model parameters. A computationally efficient and powerful Bayesian approach is presented for sparse high-dimensional linear regression, requiring only minimal prior assumptions on parameters through plug-in empirical Bayes estimates of hyperparameters. The method employs a Parameter-Expanded Expectation-Conditional-Maximization (PX-ECM) algorithm to estimate maximum a posteriori (MAP) values of parameters via computationally efficient coordinate-wise optimization. The popular two-group approach to multiple testing motivates the E-step, resulting in a PaRtitiOned empirical Bayes Ecm (PROBE) algorithm for sparse high-dimensional linear regression. Both one-at-a-time and all-at-once optimization can be used to complete PROBE. Extensive simulation studies and analyses of cancer cell drug responses are conducted to compare PROBE's empirical properties with those of related methods. Implementation is available through the R package probe.
{"title":"Efficient sparse high-dimensional linear regression with a partitioned empirical Bayes ECM algorithm","authors":"Alexander C. McLain , Anja Zgodic , Howard Bondell","doi":"10.1016/j.csda.2025.108146","DOIUrl":"10.1016/j.csda.2025.108146","url":null,"abstract":"<div><div>Bayesian variable selection methods are powerful techniques for fitting sparse high-dimensional linear regression models. However, many are computationally intensive or require restrictive prior distributions on model parameters. A computationally efficient and powerful Bayesian approach is presented for sparse high-dimensional linear regression, requiring only minimal prior assumptions on parameters through plug-in empirical Bayes estimates of hyperparameters. The method employs a Parameter-Expanded Expectation-Conditional-Maximization (PX-ECM) algorithm to estimate maximum <em>a posteriori</em> (MAP) values of parameters via computationally efficient coordinate-wise optimization. The popular two-group approach to multiple testing motivates the E-step, resulting in a PaRtitiOned empirical Bayes Ecm (PROBE) algorithm for sparse high-dimensional linear regression. Both one-at-a-time and all-at-once optimization can be used to complete PROBE. Extensive simulation studies and analyses of cancer cell drug responses are conducted to compare PROBE's empirical properties with those of related methods. Implementation is available through the R package <span>probe</span>.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"207 ","pages":"Article 108146"},"PeriodicalIF":1.5,"publicationDate":"2025-02-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143379333","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-02-10DOI: 10.1016/j.csda.2025.108145
Sharmistha Guha , Jerome P. Reiter
In the social and health sciences, researchers often make causal inferences using sensitive variables. These researchers, as well as the data holders themselves, may be ethically and perhaps legally obligated to protect the confidentiality of study participants' data. It is now known that releasing any statistics, including estimates of causal effects, computed with confidential data leaks information about the underlying data values. Thus, analysts may desire to use causal estimators that can provably bound this information leakage. Motivated by this goal, new algorithms are developed for estimating weighted average treatment effects with binary outcomes that satisfy the criterion of differential privacy. Theoretical results are presented on the accuracy of several differentially private estimators of weighted average treatment effects. Empirical evaluations using simulated data and a causal analysis involving education and income data illustrate the performance of these estimators.
{"title":"Differentially private estimation of weighted average treatment effects for binary outcomes","authors":"Sharmistha Guha , Jerome P. Reiter","doi":"10.1016/j.csda.2025.108145","DOIUrl":"10.1016/j.csda.2025.108145","url":null,"abstract":"<div><div>In the social and health sciences, researchers often make causal inferences using sensitive variables. These researchers, as well as the data holders themselves, may be ethically and perhaps legally obligated to protect the confidentiality of study participants' data. It is now known that releasing any statistics, including estimates of causal effects, computed with confidential data leaks information about the underlying data values. Thus, analysts may desire to use causal estimators that can provably bound this information leakage. Motivated by this goal, new algorithms are developed for estimating weighted average treatment effects with binary outcomes that satisfy the criterion of differential privacy. Theoretical results are presented on the accuracy of several differentially private estimators of weighted average treatment effects. Empirical evaluations using simulated data and a causal analysis involving education and income data illustrate the performance of these estimators.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"207 ","pages":"Article 108145"},"PeriodicalIF":1.5,"publicationDate":"2025-02-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143395915","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-27DOI: 10.1016/j.csda.2025.108141
Henri Pesonen , Jukka Corander
Approximate Bayesian computation (ABC) methods are standard tools for inferring parameters of complex models when the likelihood function is analytically intractable. A popular approach to improving the poor acceptance rate of the basic rejection sampling ABC algorithm is to use sequential Monte Carlo (ABC SMC) to produce a sequence of proposal distributions adapting towards the posterior, instead of generating values from the prior distribution of the model parameters. Proposal distribution for the subsequent iteration is typically obtained from a weighted set of samples, often called particles, of the current iteration of this sequence. Current methods for constructing these proposal distributions treat all the particles equivalently, regardless of the corresponding value generated by the sampler, which may lead to inefficiency when propagating the information across iterations of the algorithm. To improve sampler efficiency, a modified approach called stratified distance ABC SMC is introduced. The algorithm stratifies particles based on their distance between the corresponding synthetic and observed data, and then constructs distinct proposal distributions for all the strata. Taking into account the distribution of distances across the particle space leads to substantially improved acceptance rate of the rejection sampling. It is shown that further efficiency could be gained by using a newly proposed stopping rule for the sequential process based on the stratified posterior samples and these advances are demonstrated by several examples.
{"title":"Stratified distance space improves the efficiency of sequential samplers for approximate Bayesian computation","authors":"Henri Pesonen , Jukka Corander","doi":"10.1016/j.csda.2025.108141","DOIUrl":"10.1016/j.csda.2025.108141","url":null,"abstract":"<div><div>Approximate Bayesian computation (ABC) methods are standard tools for inferring parameters of complex models when the likelihood function is analytically intractable. A popular approach to improving the poor acceptance rate of the basic rejection sampling ABC algorithm is to use sequential Monte Carlo (ABC SMC) to produce a sequence of proposal distributions adapting towards the posterior, instead of generating values from the prior distribution of the model parameters. Proposal distribution for the subsequent iteration is typically obtained from a weighted set of samples, often called particles, of the current iteration of this sequence. Current methods for constructing these proposal distributions treat all the particles equivalently, regardless of the corresponding value generated by the sampler, which may lead to inefficiency when propagating the information across iterations of the algorithm. To improve sampler efficiency, a modified approach called stratified distance ABC SMC is introduced. The algorithm stratifies particles based on their distance between the corresponding synthetic and observed data, and then constructs distinct proposal distributions for all the strata. Taking into account the distribution of distances across the particle space leads to substantially improved acceptance rate of the rejection sampling. It is shown that further efficiency could be gained by using a newly proposed stopping rule for the sequential process based on the stratified posterior samples and these advances are demonstrated by several examples.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"207 ","pages":"Article 108141"},"PeriodicalIF":1.5,"publicationDate":"2025-01-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143162050","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-27DOI: 10.1016/j.csda.2025.108142
Nikolai Spuck , Matthias Schmid , Malte Monin , Moritz Berger
The tree-structured varying coefficient (TSVC) model is a flexible regression approach that allows the effects of covariates to vary with the values of the effect modifiers. Relevant effect modifiers are identified inherently using recursive partitioning techniques. To quantify uncertainty in TSVC models, a procedure to construct confidence intervals of the estimated partition-specific coefficients is proposed. This task constitutes a selective inference problem as the coefficients of a TSVC model result from data-driven model building. To account for this issue, a parametric bootstrap approach, which is tailored to the complex structure of TSVC, is introduced. Finite sample properties, particularly coverage proportions, of the proposed confidence intervals are evaluated in a simulation study. For illustration, applications to data from COVID-19 patients and from patients suffering from acute odontogenic infection are considered. The proposed approach may also be adapted for constructing confidence intervals for other tree-based methods.
{"title":"Confidence intervals for tree-structured varying coefficients","authors":"Nikolai Spuck , Matthias Schmid , Malte Monin , Moritz Berger","doi":"10.1016/j.csda.2025.108142","DOIUrl":"10.1016/j.csda.2025.108142","url":null,"abstract":"<div><div>The tree-structured varying coefficient (TSVC) model is a flexible regression approach that allows the effects of covariates to vary with the values of the effect modifiers. Relevant effect modifiers are identified inherently using recursive partitioning techniques. To quantify uncertainty in TSVC models, a procedure to construct confidence intervals of the estimated partition-specific coefficients is proposed. This task constitutes a selective inference problem as the coefficients of a TSVC model result from data-driven model building. To account for this issue, a parametric bootstrap approach, which is tailored to the complex structure of TSVC, is introduced. Finite sample properties, particularly coverage proportions, of the proposed confidence intervals are evaluated in a simulation study. For illustration, applications to data from COVID-19 patients and from patients suffering from acute odontogenic infection are considered. The proposed approach may also be adapted for constructing confidence intervals for other tree-based methods.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"207 ","pages":"Article 108142"},"PeriodicalIF":1.5,"publicationDate":"2025-01-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143162450","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-26DOI: 10.1016/j.csda.2025.108133
Pia Pfeiffer , Andreas Alfons , Peter Filzmoser
Robust statistical estimators offer resilience against outliers but are often computationally challenging, particularly in high-dimensional sparse settings. Modern optimization techniques are utilized for robust sparse association estimators without imposing constraints on the covariance structure. The approach splits the problem into a robust estimation phase, followed by optimization of a decoupled, biconvex problem to derive the sparse canonical vectors. An augmented Lagrangian algorithm, combined with a modified adaptive gradient descent method, induces sparsity through simultaneous updates of both canonical vectors. Results demonstrate improved precision over existing methods, with high-dimensional empirical examples illustrating the effectiveness of this approach. The methodology can also be extended to other robust sparse estimators.
{"title":"Efficient computation of sparse and robust maximum association estimators","authors":"Pia Pfeiffer , Andreas Alfons , Peter Filzmoser","doi":"10.1016/j.csda.2025.108133","DOIUrl":"10.1016/j.csda.2025.108133","url":null,"abstract":"<div><div>Robust statistical estimators offer resilience against outliers but are often computationally challenging, particularly in high-dimensional sparse settings. Modern optimization techniques are utilized for robust sparse association estimators without imposing constraints on the covariance structure. The approach splits the problem into a robust estimation phase, followed by optimization of a decoupled, biconvex problem to derive the sparse canonical vectors. An augmented Lagrangian algorithm, combined with a modified adaptive gradient descent method, induces sparsity through simultaneous updates of both canonical vectors. Results demonstrate improved precision over existing methods, with high-dimensional empirical examples illustrating the effectiveness of this approach. The methodology can also be extended to other robust sparse estimators.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"207 ","pages":"Article 108133"},"PeriodicalIF":1.5,"publicationDate":"2025-01-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143162051","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}