Pub Date : 2026-01-24DOI: 10.1016/j.csda.2025.108338
Kang Meng, Yujie Gai
A renewable weighted estimation method for linear regression with non-convex regularization is proposed, tailored for streaming data with missing covariates. The proposed method is implemented via a two-step estimation strategy. In the first step, a renewable formulation of the parameter of interest in the propensity score function is derived. Based on this, a renewable weighted optimization objective for the regression coefficients is constructed in the second step, which is updated using the current data and summary statistics from historical data. The objective is solved via a locally adaptive majorize-minimization algorithm with previous estimates as initialization, while the penalty parameter is determined using the proposed online rolling validation procedure. Theoretical results demonstrate that the renewable estimator is asymptotically normal and maintains estimation efficiency compared to offline methods that process all data at once. Simulation studies and real data analysis further confirm that the proposed estimator achieves competitive statistical performance while significantly improving computational efficiency and reducing memory requirements.
{"title":"Renewable penalized linear regression via inverse probability weighting for streaming data with missing covariates","authors":"Kang Meng, Yujie Gai","doi":"10.1016/j.csda.2025.108338","DOIUrl":"10.1016/j.csda.2025.108338","url":null,"abstract":"<div><div>A renewable weighted estimation method for linear regression with non-convex regularization is proposed, tailored for streaming data with missing covariates. The proposed method is implemented via a two-step estimation strategy. In the first step, a renewable formulation of the parameter of interest in the propensity score function is derived. Based on this, a renewable weighted optimization objective for the regression coefficients is constructed in the second step, which is updated using the current data and summary statistics from historical data. The objective is solved via a locally adaptive majorize-minimization algorithm with previous estimates as initialization, while the penalty parameter is determined using the proposed online rolling validation procedure. Theoretical results demonstrate that the renewable estimator is asymptotically normal and maintains estimation efficiency compared to offline methods that process all data at once. Simulation studies and real data analysis further confirm that the proposed estimator achieves competitive statistical performance while significantly improving computational efficiency and reducing memory requirements.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"219 ","pages":"Article 108338"},"PeriodicalIF":1.6,"publicationDate":"2026-01-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146081849","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-21DOI: 10.1016/j.csda.2026.108345
Yiwei Fan , Xiaoshi Lu , Xiaoling Lu
A smoothed maximum rank correlation (MRC) estimator for ordinal choice models is introduced, combining a linear function with a nonlinear component modeled by deep neural networks to achieve both identifiability and interpretability. A two-step estimation algorithm is designed that maintains the order relations among outputs without relying on the parallelism assumption, making it appealing in practical applicability. The statistical properties of the smoothed MRC estimator are established under regular conditions, including identification, convergence rate, and minimax optimality, while allowing the number of categories to increase with sample size. Our theoretical results extend beyond ordinal choice models and apply to a broad range of generalized regression models. Extensive simulations demonstrate the superiority of the proposed method in classification accuracy and interpretability. Its effectiveness is further validated through applications to twelve benchmark datasets and an online education dataset.
{"title":"A smoothed maximum rank correlation estimator for deep ordinal choice models","authors":"Yiwei Fan , Xiaoshi Lu , Xiaoling Lu","doi":"10.1016/j.csda.2026.108345","DOIUrl":"10.1016/j.csda.2026.108345","url":null,"abstract":"<div><div>A smoothed maximum rank correlation (MRC) estimator for ordinal choice models is introduced, combining a linear function with a nonlinear component modeled by deep neural networks to achieve both identifiability and interpretability. A two-step estimation algorithm is designed that maintains the order relations among outputs without relying on the parallelism assumption, making it appealing in practical applicability. The statistical properties of the smoothed MRC estimator are established under regular conditions, including identification, convergence rate, and minimax optimality, while allowing the number of categories to increase with sample size. Our theoretical results extend beyond ordinal choice models and apply to a broad range of generalized regression models. Extensive simulations demonstrate the superiority of the proposed method in classification accuracy and interpretability. Its effectiveness is further validated through applications to twelve benchmark datasets and an online education dataset.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"219 ","pages":"Article 108345"},"PeriodicalIF":1.6,"publicationDate":"2026-01-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146081745","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-20DOI: 10.1016/j.csda.2026.108344
Quynh Nhu Nguyen, Victor De Oliveira
Count time series arise in diverse contexts and may display a diversity of distributional features that may include overdispersion, zero–inflation, covariates’ effects and complex dependence structures. A class of models with the potential to account for this diversity is that of Gaussian copulas, which are computationally challenging to fit. A scalable and accurate likelihood approximation strategy is proposed that employs minimax exponential tilting (MET) to fit Gaussian copula models with arbitrary marginals and ARMA latent processes to count time series. The proposed method, called Time Series Minimax Exponential Tilting (TMET), exploits the exact conditional structure of causal and invertible ARMA processes to construct an optimized importance sampling density. Costly Cholesky decompositions are avoided by using a simplified Innovations algorithm to recursively compute conditional means and variances, and further accelerates computation through a sparse representation of the best linear prediction matrix. These innovations achieve linear computational complexity in the series length, while preserving key theoretical guarantees, including vanishing relative error in rare–event regimes. Simulation studies show that TMET outperforms widely used methods, including the Geweke–Hajivassiliou–Keane (GHK) simulator and the recent Vecchia–based MET (VMET) approach, especially in scenarios with low counts, strong dependence, and moving average latent processes. Beyond estimation, the copula framework is extended to include predictive inference and model diagnostics based on scoring rules and randomized quantile residuals. A real–world application to temperature data from the Kickapoo Downtown Airport in Texas demonstrates TMET’s advantages over the commonly used GHK simulator.
{"title":"Likelihood inference in Gaussian copula models for count time series via minimax exponential tilting","authors":"Quynh Nhu Nguyen, Victor De Oliveira","doi":"10.1016/j.csda.2026.108344","DOIUrl":"10.1016/j.csda.2026.108344","url":null,"abstract":"<div><div>Count time series arise in diverse contexts and may display a diversity of distributional features that may include overdispersion, zero–inflation, covariates’ effects and complex dependence structures. A class of models with the potential to account for this diversity is that of Gaussian copulas, which are computationally challenging to fit. A scalable and accurate likelihood approximation strategy is proposed that employs minimax exponential tilting (MET) to fit Gaussian copula models with arbitrary marginals and ARMA latent processes to count time series. The proposed method, called <em>Time Series Minimax Exponential Tilting</em> (TMET), exploits the exact conditional structure of causal and invertible ARMA processes to construct an optimized importance sampling density. Costly Cholesky decompositions are avoided by using a simplified Innovations algorithm to recursively compute conditional means and variances, and further accelerates computation through a sparse representation of the best linear prediction matrix. These innovations achieve linear computational complexity in the series length, while preserving key theoretical guarantees, including vanishing relative error in rare–event regimes. Simulation studies show that TMET outperforms widely used methods, including the Geweke–Hajivassiliou–Keane (GHK) simulator and the recent Vecchia–based MET (VMET) approach, especially in scenarios with low counts, strong dependence, and moving average latent processes. Beyond estimation, the copula framework is extended to include predictive inference and model diagnostics based on scoring rules and randomized quantile residuals. A real–world application to temperature data from the Kickapoo Downtown Airport in Texas demonstrates TMET’s advantages over the commonly used GHK simulator.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"218 ","pages":"Article 108344"},"PeriodicalIF":1.6,"publicationDate":"2026-01-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146079054","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-17DOI: 10.1016/j.csda.2026.108341
Antoine Godichon-Baggioni , Stéphane Robin , Laure Sansonnet
The robust estimation of the parameters of multivariate Gaussian linear regression models is considered by using robust versions of the usual (Mahalanobis) least-square criterion, with or without Ridge regularization. Two methods of estimation are introduced: (i) online stochastic gradient descent algorithms and their averaged variants, and (ii) offline fixed-point algorithms. These methods are applied to both the standard and Mahalanobis least-squares criteria, as well as to their regularized counterparts. Under weak assumptions, the resulting estimators are shown to be asymptotically normal. Since the noise covariance matrix is generally unknown, a robust estimate of this matrix is incorporated into the Mahalanobis-based stochastic gradient descent algorithms. Numerical experiments on synthetic data demonstrate a substantial gain in robustness compared with classical least-squares estimators, while also highlighting the computational efficiency of the online procedures. All proposed algorithms are implemented in the R package RobRegression, available on CRAN.
{"title":"Online and offline robust multivariate linear regression","authors":"Antoine Godichon-Baggioni , Stéphane Robin , Laure Sansonnet","doi":"10.1016/j.csda.2026.108341","DOIUrl":"10.1016/j.csda.2026.108341","url":null,"abstract":"<div><div>The robust estimation of the parameters of multivariate Gaussian linear regression models is considered by using robust versions of the usual (Mahalanobis) least-square criterion, with or without Ridge regularization. Two methods of estimation are introduced: (i) online stochastic gradient descent algorithms and their averaged variants, and (ii) offline fixed-point algorithms. These methods are applied to both the standard and Mahalanobis least-squares criteria, as well as to their regularized counterparts. Under weak assumptions, the resulting estimators are shown to be asymptotically normal. Since the noise covariance matrix is generally unknown, a robust estimate of this matrix is incorporated into the Mahalanobis-based stochastic gradient descent algorithms. Numerical experiments on synthetic data demonstrate a substantial gain in robustness compared with classical least-squares estimators, while also highlighting the computational efficiency of the online procedures. All proposed algorithms are implemented in the <span>R</span> package <span>RobRegression</span>, available on CRAN.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"218 ","pages":"Article 108341"},"PeriodicalIF":1.6,"publicationDate":"2026-01-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146023871","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-17DOI: 10.1016/j.csda.2026.108342
Qin Wang, Edmund Osei
Sufficient dimension reduction (SDR) aims at reducing the data dimensionality without loss of the information on the conditional distribution between the response and its high dimensional predictors. Most existing SDR methods were developed under a general regression model, and may lose efficiency when the response is binary. A novel approach is proposed in this study. It combines the gradient boosting machines (GBM) and the sliced regression (SR) to effectively recover the central dimension reduction subspace in binary classification. Numerical experiments and real data applications demonstrate its superior performance and scalability in computation.
{"title":"Boosted sliced regression for dimension reduction in binary classification","authors":"Qin Wang, Edmund Osei","doi":"10.1016/j.csda.2026.108342","DOIUrl":"10.1016/j.csda.2026.108342","url":null,"abstract":"<div><div>Sufficient dimension reduction (SDR) aims at reducing the data dimensionality without loss of the information on the conditional distribution between the response and its high dimensional predictors. Most existing SDR methods were developed under a general regression model, and may lose efficiency when the response is binary. A novel approach is proposed in this study. It combines the gradient boosting machines (GBM) and the sliced regression (SR) to effectively recover the central dimension reduction subspace in binary classification. Numerical experiments and real data applications demonstrate its superior performance and scalability in computation.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"218 ","pages":"Article 108342"},"PeriodicalIF":1.6,"publicationDate":"2026-01-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146023870","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-14DOI: 10.1016/j.csda.2026.108340
Xuetong Cui , Orla A. Murphy , Paul D. McNicholas
Clustering is a powerful technique for uncovering hidden patterns or subgroups within complex datasets. Recently, the use of mixtures of multiple linear regression models has gained popularity due to their ability to account for underlying heterogeneity in regression-type data and to provide a comprehensive understanding of covariate impacts across latent subgroups. However, models tailored for a multivariate response are relatively rare, especially when the response variables are dependent. Copula regression addresses this issue by employing copulas to model dependencies between response variables. To address this need, a copula-based finite mixture of regression models is proposed for clustering and interpreting covariate effects in heterogeneous multivariate continuous response data. An expectation-conditional-maximization algorithm is used to estimate the model. Simulation studies and real-data analyses illustrate the improved clustering performance of the proposed models compared to existing methods.
{"title":"Copula-based mixtures of regression models for multivariate response data","authors":"Xuetong Cui , Orla A. Murphy , Paul D. McNicholas","doi":"10.1016/j.csda.2026.108340","DOIUrl":"10.1016/j.csda.2026.108340","url":null,"abstract":"<div><div>Clustering is a powerful technique for uncovering hidden patterns or subgroups within complex datasets. Recently, the use of mixtures of multiple linear regression models has gained popularity due to their ability to account for underlying heterogeneity in regression-type data and to provide a comprehensive understanding of covariate impacts across latent subgroups. However, models tailored for a multivariate response are relatively rare, especially when the response variables are dependent. Copula regression addresses this issue by employing copulas to model dependencies between response variables. To address this need, a copula-based finite mixture of regression models is proposed for clustering and interpreting covariate effects in heterogeneous multivariate continuous response data. An expectation-conditional-maximization algorithm is used to estimate the model. Simulation studies and real-data analyses illustrate the improved clustering performance of the proposed models compared to existing methods.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"218 ","pages":"Article 108340"},"PeriodicalIF":1.6,"publicationDate":"2026-01-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146023872","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Among existing methods for testing independence, projection correlation possesses several appealing properties: it is insensitive to the dimensions of the two random vectors, invariant under orthogonal transformations, and requires no tuning parameters or moment conditions for its estimation. This paper proposes a projection correlation-based approach for measuring and testing conditional dependence within a factor model framework. The proposed measure accommodates response vectors and common factors of varying dimensions while allowing the number of factors to grow to infinity with the sample size. The asymptotic properties of the projection correlation statistic are established under both the null and alternative hypotheses. In addition, a general approach is introduced for constructing dependency graphs without the Gaussian assumption, utilizing the proposed test. Numerical simulations and real data analysis demonstrate the superiority and practicality of the proposed methods.
{"title":"Conditional independence test in factor models via projection correlation","authors":"Xilin Zhang , Hongxia Xu , Guoliang Fan , Liping Zhu","doi":"10.1016/j.csda.2026.108339","DOIUrl":"10.1016/j.csda.2026.108339","url":null,"abstract":"<div><div>Among existing methods for testing independence, projection correlation possesses several appealing properties: it is insensitive to the dimensions of the two random vectors, invariant under orthogonal transformations, and requires no tuning parameters or moment conditions for its estimation. This paper proposes a projection correlation-based approach for measuring and testing conditional dependence within a factor model framework. The proposed measure accommodates response vectors and common factors of varying dimensions while allowing the number of factors to grow to infinity with the sample size. The asymptotic properties of the projection correlation statistic are established under both the null and alternative hypotheses. In addition, a general approach is introduced for constructing dependency graphs without the Gaussian assumption, utilizing the proposed test. Numerical simulations and real data analysis demonstrate the superiority and practicality of the proposed methods.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"218 ","pages":"Article 108339"},"PeriodicalIF":1.6,"publicationDate":"2026-01-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145980334","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper introduces a novel periodogram-like function, called the expectile periodogram (EP), for modeling spectral features of time series and detecting hidden periodicities. The EP is constructed from trigonometric expectile regression (ER), in which a specially designed loss function is used to substitute the squared ℓ2 norm that leads to the ordinary periodogram. The EP retains the key properties of the ordinary periodogram as a frequency-domain representation of serial dependence in time series, while offering a more comprehensive understanding by examining the data across the entire range of expectile levels. The asymptotic theory is established to investigate the relationship between the EP and the so-called expectile spectrum. Simulations demonstrate the efficiency of the EP in the presence of hidden periodicities. In addition, by leveraging the inherent two-dimensional nature of the EP, we train a deep learning model to classify earthquake waveform data. Notably, our approach outperforms alternative periodogram-based methods in terms of classification accuracy.
{"title":"Expectile periodogram","authors":"Tianbo Chen , Ta-Hsin Li , Hanbing Zhu , Wenwu Gao","doi":"10.1016/j.csda.2025.108337","DOIUrl":"10.1016/j.csda.2025.108337","url":null,"abstract":"<div><div>This paper introduces a novel periodogram-like function, called the expectile periodogram (EP), for modeling spectral features of time series and detecting hidden periodicities. The EP is constructed from trigonometric expectile regression (ER), in which a specially designed loss function is used to substitute the squared ℓ<sub>2</sub> norm that leads to the ordinary periodogram. The EP retains the key properties of the ordinary periodogram as a frequency-domain representation of serial dependence in time series, while offering a more comprehensive understanding by examining the data across the entire range of expectile levels. The asymptotic theory is established to investigate the relationship between the EP and the so-called expectile spectrum. Simulations demonstrate the efficiency of the EP in the presence of hidden periodicities. In addition, by leveraging the inherent two-dimensional nature of the EP, we train a deep learning model to classify earthquake waveform data. Notably, our approach outperforms alternative periodogram-based methods in terms of classification accuracy.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"217 ","pages":"Article 108337"},"PeriodicalIF":1.6,"publicationDate":"2025-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145939215","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-27DOI: 10.1016/j.csda.2025.108323
Abdelhakim Aknouche , Sónia Gouveia , Manuel G. Scotto
A common approach to analyze time series of counts is to fit models based on random sum operators. As an alternative, this paper introduces time series models based on a random multiplication operator, which is simply the multiplication of a variable operand by an integer-valued random coefficient, whose mean is the constant operand. Such an operation is endowed into autoregressive-like models with integer-valued random inputs, addressed as RMINAR. Two special variants are studied, namely the -valued random coefficient autoregressive model and the -valued random coefficient multiplicative error model. Furthermore, -valued extensions are also considered. The dynamic structure of the proposed models is studied in detail. In particular, their corresponding solutions are everywhere strictly stationary and ergodic, which is not common in either the literature on integer-valued time series models or real-valued random coefficient autoregressive models. Therefore, RMINAR model parameters are estimated using a four-stage weighted least squares estimator, with consistency and asymptotic normality established everywhere in the parameter space. Finally, the performance of the new RMINAR models is illustrated with simulated and empirical examples.
{"title":"Random multiplication versus random sum: Autoregressive-like models with integer-valued random inputs","authors":"Abdelhakim Aknouche , Sónia Gouveia , Manuel G. Scotto","doi":"10.1016/j.csda.2025.108323","DOIUrl":"10.1016/j.csda.2025.108323","url":null,"abstract":"<div><div>A common approach to analyze time series of counts is to fit models based on random sum operators. As an alternative, this paper introduces time series models based on a random multiplication operator, which is simply the multiplication of a variable operand by an integer-valued random coefficient, whose mean is the constant operand. Such an operation is endowed into autoregressive-like models with integer-valued random inputs, addressed as RMINAR. Two special variants are studied, namely the <span><math><msub><mi>N</mi><mn>0</mn></msub></math></span>-valued random coefficient autoregressive model and the <span><math><msub><mi>N</mi><mn>0</mn></msub></math></span>-valued random coefficient multiplicative error model. Furthermore, <span><math><mi>Z</mi></math></span>-valued extensions are also considered. The dynamic structure of the proposed models is studied in detail. In particular, their corresponding solutions are everywhere strictly stationary and ergodic, which is not common in either the literature on integer-valued time series models or real-valued random coefficient autoregressive models. Therefore, RMINAR model parameters are estimated using a four-stage weighted least squares estimator, with consistency and asymptotic normality established everywhere in the parameter space. Finally, the performance of the new RMINAR models is illustrated with simulated and empirical examples.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"217 ","pages":"Article 108323"},"PeriodicalIF":1.6,"publicationDate":"2025-12-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145939216","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-25DOI: 10.1016/j.csda.2025.108322
Steven G. Gilmour , Peter Goos , Heiko Großmann
Since the dawn of response surface methodology, it has been recommended that designs include replicate points, so that pure error estimates of variance can be obtained and used to provide reliable estimated standard errors of the effects of factors. In designs with more than one stratum, such as split-plot and split-split-plot designs, it is less obvious how pure error estimates of the variance components should be obtained, and no pure error estimates are given by the popular residual maximum likelihood (REML) method of estimation. A method of pure error REML estimation of the variance components, using the full treatment model, is obtained by treating each combination of factor levels as a discrete treatment. This method is easy to implement using standard software and improved estimated standard errors of the fixed effects estimates can be obtained by applying the Kenward-Roger correction based on the pure error REML estimates. The new method is illustrated using several data sets and the performance of pure error REML is compared with the standard REML method. The results are comparable when the assumed response surface model is correct, but the new method is considerably more robust in the case of model misspecification.
{"title":"Pure error REML for analyzing data from multi-stratum designs","authors":"Steven G. Gilmour , Peter Goos , Heiko Großmann","doi":"10.1016/j.csda.2025.108322","DOIUrl":"10.1016/j.csda.2025.108322","url":null,"abstract":"<div><div>Since the dawn of response surface methodology, it has been recommended that designs include replicate points, so that pure error estimates of variance can be obtained and used to provide reliable estimated standard errors of the effects of factors. In designs with more than one stratum, such as split-plot and split-split-plot designs, it is less obvious how pure error estimates of the variance components should be obtained, and no pure error estimates are given by the popular residual maximum likelihood (REML) method of estimation. A method of pure error REML estimation of the variance components, using the full treatment model, is obtained by treating each combination of factor levels as a discrete treatment. This method is easy to implement using standard software and improved estimated standard errors of the fixed effects estimates can be obtained by applying the Kenward-Roger correction based on the pure error REML estimates. The new method is illustrated using several data sets and the performance of pure error REML is compared with the standard REML method. The results are comparable when the assumed response surface model is correct, but the new method is considerably more robust in the case of model misspecification.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"218 ","pages":"Article 108322"},"PeriodicalIF":1.6,"publicationDate":"2025-12-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145928791","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}