Longxiang Fang, N. Balakrishnan, Wenyu Huang, Shuai Zhang
In this paper, we discuss stochastic comparison of the largest order statistics arising from two sets of dependent distribution‐free random variables with respect to multivariate chain majorization, where the dependency structure can be defined by Archimedean copulas. When a distribution‐free model with possibly two parameter vectors has its matrix of parameters changing to another matrix of parameters in a certain mathematical sense, we obtain the first sample maxima is larger than the second sample maxima with respect to the usual stochastic order, based on certain conditions. Applications of our results for scale proportional reverse hazards model, exponentiated gamma distribution, Gompertz–Makeham distribution, and location‐scale model, are also given. Meanwhile, we provide two numerical examples to illustrate the results established here.
{"title":"Usual stochastic ordering of the sample maxima from dependent distribution‐free random variables","authors":"Longxiang Fang, N. Balakrishnan, Wenyu Huang, Shuai Zhang","doi":"10.1111/stan.12275","DOIUrl":"https://doi.org/10.1111/stan.12275","url":null,"abstract":"In this paper, we discuss stochastic comparison of the largest order statistics arising from two sets of dependent distribution‐free random variables with respect to multivariate chain majorization, where the dependency structure can be defined by Archimedean copulas. When a distribution‐free model with possibly two parameter vectors has its matrix of parameters changing to another matrix of parameters in a certain mathematical sense, we obtain the first sample maxima is larger than the second sample maxima with respect to the usual stochastic order, based on certain conditions. Applications of our results for scale proportional reverse hazards model, exponentiated gamma distribution, Gompertz–Makeham distribution, and location‐scale model, are also given. Meanwhile, we provide two numerical examples to illustrate the results established here.","PeriodicalId":51178,"journal":{"name":"Statistica Neerlandica","volume":"76 1","pages":"112 - 99"},"PeriodicalIF":1.5,"publicationDate":"2022-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89908645","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The stratified logrank test can be used to compare survival distributions of several groups of patients, while adjusting for the effect of some discrete variable that may be predictive of the survival outcome. In practice, it can happen that this discrete variable is missing for some patients. An inverse‐probability‐weighted version of the stratified logrank statistic is introduced to tackle this issue. Its asymptotic distribution is derived under the null hypothesis of equality of the survival distributions. A simulation study is conducted to assess behavior of the proposed test statistic in finite samples. An analysis of a medical dataset illustrates the methodology.
{"title":"Inverse‐probability‐weighted logrank test for stratified survival data with missing measurements","authors":"Rim Ben Elouefi, Foued Saâdaoui","doi":"10.1111/stan.12276","DOIUrl":"https://doi.org/10.1111/stan.12276","url":null,"abstract":"The stratified logrank test can be used to compare survival distributions of several groups of patients, while adjusting for the effect of some discrete variable that may be predictive of the survival outcome. In practice, it can happen that this discrete variable is missing for some patients. An inverse‐probability‐weighted version of the stratified logrank statistic is introduced to tackle this issue. Its asymptotic distribution is derived under the null hypothesis of equality of the survival distributions. A simulation study is conducted to assess behavior of the proposed test statistic in finite samples. An analysis of a medical dataset illustrates the methodology.","PeriodicalId":51178,"journal":{"name":"Statistica Neerlandica","volume":"29 1","pages":"113 - 129"},"PeriodicalIF":1.5,"publicationDate":"2022-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82520985","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We study a statistical framework for replicability based on a recently proposed quantitative measure of replication success, the sceptical p$$ p $$ ‐value. A recalibration is proposed to obtain exact overall Type‐I error control if the effect is null in both studies and additional bounds on the partial and conditional Type‐I error rate, which represent the case where only one study has a null effect. The approach avoids the double dichotomization for significance of the two‐trials rule and has larger project power to detect existing effects over both studies in combination. It can also be used for power calculations and requires a smaller replication sample size than the two‐trials rule for already convincing original studies. We illustrate the performance of the proposed methodology in an application to data from the Experimental Economics Replication Project.
我们研究了一个统计框架的可复制性基于最近提出的复制成功的定量测量,怀疑p $$ p $$‐值。如果两项研究的影响为零,以及部分和条件型I错误率的附加界限,则建议重新校准以获得精确的总体型I误差控制,这代表了只有一项研究具有零效应的情况。该方法避免了两次试验规则显著性的双重二分法,并且具有更大的项目能力来检测两项研究合并后的现有效应。它也可以用于功率计算,并且需要比已经令人信服的原始研究的两次试验规则更小的复制样本量。我们在实验经济学复制项目的数据应用中说明了所提出方法的性能。
{"title":"Assessing replicability with the sceptical p$$ p $$ ‐value: Type‐I error control and sample size planning","authors":"Charlotte Micheloud, F. Balabdaoui, L. Held","doi":"10.1111/stan.12312","DOIUrl":"https://doi.org/10.1111/stan.12312","url":null,"abstract":"We study a statistical framework for replicability based on a recently proposed quantitative measure of replication success, the sceptical p$$ p $$ ‐value. A recalibration is proposed to obtain exact overall Type‐I error control if the effect is null in both studies and additional bounds on the partial and conditional Type‐I error rate, which represent the case where only one study has a null effect. The approach avoids the double dichotomization for significance of the two‐trials rule and has larger project power to detect existing effects over both studies in combination. It can also be used for power calculations and requires a smaller replication sample size than the two‐trials rule for already convincing original studies. We illustrate the performance of the proposed methodology in an application to data from the Experimental Economics Replication Project.","PeriodicalId":51178,"journal":{"name":"Statistica Neerlandica","volume":"77 1","pages":"573 - 591"},"PeriodicalIF":1.5,"publicationDate":"2022-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83870470","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hypothesis testing is challenging due to the test statistic's complicated asymptotic distribution when it is based on a regularized estimator in high dimensions. We propose a robust testing framework for ℓ1$$ {ell}_1 $$ ‐regularized M‐estimators to cope with non‐Gaussian distributed regression errors, using the robust approximate message passing algorithm. The proposed framework enjoys an automatically built‐in bias correction and is applicable with general convex nondifferentiable loss functions which also allows inference when the focus is a conditional quantile instead of the mean of the response. The estimator compares numerically well with the debiased and desparsified approaches while using the least squares loss function. The use of the Huber loss function demonstrates that the proposed construction provides stable confidence intervals under different regression error distributions.
{"title":"Automatic bias correction for testing in high‐dimensional linear models","authors":"Jing Zhou, G. Claeskens","doi":"10.1111/stan.12274","DOIUrl":"https://doi.org/10.1111/stan.12274","url":null,"abstract":"Hypothesis testing is challenging due to the test statistic's complicated asymptotic distribution when it is based on a regularized estimator in high dimensions. We propose a robust testing framework for ℓ1$$ {ell}_1 $$ ‐regularized M‐estimators to cope with non‐Gaussian distributed regression errors, using the robust approximate message passing algorithm. The proposed framework enjoys an automatically built‐in bias correction and is applicable with general convex nondifferentiable loss functions which also allows inference when the focus is a conditional quantile instead of the mean of the response. The estimator compares numerically well with the debiased and desparsified approaches while using the least squares loss function. The use of the Huber loss function demonstrates that the proposed construction provides stable confidence intervals under different regression error distributions.","PeriodicalId":51178,"journal":{"name":"Statistica Neerlandica","volume":"57 1","pages":"71 - 98"},"PeriodicalIF":1.5,"publicationDate":"2022-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86790588","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
It is a matter of common observation that investors value substantial gains but are averse to heavy losses. Obvious as it may sound, this translates into an interesting preference for right‐skewed return distributions, whose right tails are heavier than their left tails. Skewness is thus not only a way to describe the shape of a distribution, but also a tool for risk measurement. We review the statistical literature on skewness and provide a comprehensive framework for its assessment. Then, we present a new measure of skewness, based on the decomposition of variance in its upward and downward components. We argue that this measure fills a gap in the literature and show in a simulation study that it strikes a good balance between robustness and sensitivity.
{"title":"Assessing skewness in financial markets","authors":"Giovanni Campisi, L. La Rocca, S. Muzzioli","doi":"10.1111/stan.12273","DOIUrl":"https://doi.org/10.1111/stan.12273","url":null,"abstract":"It is a matter of common observation that investors value substantial gains but are averse to heavy losses. Obvious as it may sound, this translates into an interesting preference for right‐skewed return distributions, whose right tails are heavier than their left tails. Skewness is thus not only a way to describe the shape of a distribution, but also a tool for risk measurement. We review the statistical literature on skewness and provide a comprehensive framework for its assessment. Then, we present a new measure of skewness, based on the decomposition of variance in its upward and downward components. We argue that this measure fills a gap in the literature and show in a simulation study that it strikes a good balance between robustness and sensitivity.","PeriodicalId":51178,"journal":{"name":"Statistica Neerlandica","volume":"61 1","pages":"48 - 70"},"PeriodicalIF":1.5,"publicationDate":"2022-05-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88552120","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zero inflation is a common nuisance while monitoring disease progression over time. This article proposes a new observation‐driven model for zero‐inflated and over‐dispersed count time series. The counts given from the past history of the process and available information on covariates are assumed to be distributed as a mixture of a Poisson distribution and a distribution degenerated at zero, with a time‐dependent mixing probability, πt . Since, count data usually suffers from overdispersion, a Gamma distribution is used to model the excess variation, resulting in a zero‐inflated negative binomial regression model with mean parameter λt . Linear predictors with autoregressive and moving average (ARMA) type terms, covariates, seasonality and trend are fitted to λt and πt through canonical link generalized linear models. Estimation is done using maximum likelihood aided by iterative algorithms, such as Newton‐Raphson (NR) and Expectation and Maximization. Theoretical results on the consistency and asymptotic normality of the estimators are given. The proposed model is illustrated using in‐depth simulation studies and two disease datasets.
随着时间的推移监测疾病进展时,零通胀是一个常见的麻烦。本文提出了一个新的观测驱动模型,用于零膨胀和过分散计数时间序列。从过去的过程历史中给出的计数和有关协变量的可用信息被假设为泊松分布和在零处退化的分布的混合分布,具有时间相关的混合概率πt。由于计数数据通常存在过度分散,因此使用Gamma分布来模拟过度变化,从而产生具有平均参数λt的零膨胀负二项回归模型。通过正则链接广义线性模型拟合具有自回归和移动平均(ARMA)型项、协变量、季节性和趋势的线性预测因子λt和πt。估计是在迭代算法(如Newton - Raphson (NR)和Expectation and Maximization)的辅助下使用最大似然来完成的。给出了估计量的相合性和渐近正态性的理论结果。所提出的模型使用深度模拟研究和两个疾病数据集来说明。
{"title":"Autoregressive and moving average models for zero‐inflated count time series","authors":"Vurukonda Sathish, S. Mukhopadhyay, R. Tiwari","doi":"10.1111/stan.12255","DOIUrl":"https://doi.org/10.1111/stan.12255","url":null,"abstract":"Zero inflation is a common nuisance while monitoring disease progression over time. This article proposes a new observation‐driven model for zero‐inflated and over‐dispersed count time series. The counts given from the past history of the process and available information on covariates are assumed to be distributed as a mixture of a Poisson distribution and a distribution degenerated at zero, with a time‐dependent mixing probability, πt . Since, count data usually suffers from overdispersion, a Gamma distribution is used to model the excess variation, resulting in a zero‐inflated negative binomial regression model with mean parameter λt . Linear predictors with autoregressive and moving average (ARMA) type terms, covariates, seasonality and trend are fitted to λt and πt through canonical link generalized linear models. Estimation is done using maximum likelihood aided by iterative algorithms, such as Newton‐Raphson (NR) and Expectation and Maximization. Theoretical results on the consistency and asymptotic normality of the estimators are given. The proposed model is illustrated using in‐depth simulation studies and two disease datasets.","PeriodicalId":51178,"journal":{"name":"Statistica Neerlandica","volume":"36 1 1","pages":"190 - 218"},"PeriodicalIF":1.5,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79903003","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper considers a continuous three‐phase polynomial regression model with two threshold points for dependent data with heteroscedasticity. We assume the model is polynomial of order zero in the middle regime, and is polynomial of higher orders elsewhere. We denote this model by ℳ2$$ {mathcal{M}}_2 $$ , which includes models with one or no threshold points, denoted by ℳ1$$ {mathcal{M}}_1 $$ and ℳ0$$ {mathcal{M}}_0 $$ , respectively, as special cases. We provide an ordered iterative least squares (OiLS) method when estimating ℳ2$$ {mathcal{M}}_2 $$ and establish the consistency of the OiLS estimators under mild conditions. When the underlying model is ℳ1$$ {mathcal{M}}_1 $$ and is (d0−1)$$ left({d}_0-1right) $$ th‐order differentiable but not d0$$ {d}_0 $$ th‐order differentiable at the threshold point, we further show the Op(N−1/(d0+2))$$ {O}_pleft({N}^{-1/left({d}_0+2right)}right) $$ convergence rate of the OiLS estimators, which can be faster than the Op(N−1/(2d0))$$ {O}_pleft({N}^{-1/left(2{d}_0right)}right) $$ convergence rate given in Feder when d0≥3$$ {d}_0ge 3 $$ . We also apply a model‐selection procedure for selecting ℳκ$$ {mathcal{M}}_{kappa } $$ ; κ=0,1,2$$ kappa =0,1,2 $$ . When the underlying model exists, we establish the selection consistency under the aforementioned conditions. Finally, we conduct simulation experiments to demonstrate the finite‐sample performance of our asymptotic results.
{"title":"Threshold estimation for continuous three‐phase polynomial regression models with constant mean in the middle regime","authors":"Chih‐Hao Chang, Kam-Fai Wong, Wei‐Yee Lim","doi":"10.1111/stan.12268","DOIUrl":"https://doi.org/10.1111/stan.12268","url":null,"abstract":"This paper considers a continuous three‐phase polynomial regression model with two threshold points for dependent data with heteroscedasticity. We assume the model is polynomial of order zero in the middle regime, and is polynomial of higher orders elsewhere. We denote this model by ℳ2$$ {mathcal{M}}_2 $$ , which includes models with one or no threshold points, denoted by ℳ1$$ {mathcal{M}}_1 $$ and ℳ0$$ {mathcal{M}}_0 $$ , respectively, as special cases. We provide an ordered iterative least squares (OiLS) method when estimating ℳ2$$ {mathcal{M}}_2 $$ and establish the consistency of the OiLS estimators under mild conditions. When the underlying model is ℳ1$$ {mathcal{M}}_1 $$ and is (d0−1)$$ left({d}_0-1right) $$ th‐order differentiable but not d0$$ {d}_0 $$ th‐order differentiable at the threshold point, we further show the Op(N−1/(d0+2))$$ {O}_pleft({N}^{-1/left({d}_0+2right)}right) $$ convergence rate of the OiLS estimators, which can be faster than the Op(N−1/(2d0))$$ {O}_pleft({N}^{-1/left(2{d}_0right)}right) $$ convergence rate given in Feder when d0≥3$$ {d}_0ge 3 $$ . We also apply a model‐selection procedure for selecting ℳκ$$ {mathcal{M}}_{kappa } $$ ; κ=0,1,2$$ kappa =0,1,2 $$ . When the underlying model exists, we establish the selection consistency under the aforementioned conditions. Finally, we conduct simulation experiments to demonstrate the finite‐sample performance of our asymptotic results.","PeriodicalId":51178,"journal":{"name":"Statistica Neerlandica","volume":"13 1","pages":"4 - 47"},"PeriodicalIF":1.5,"publicationDate":"2022-04-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87649922","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Faced with massive data, subsampling is a popular way to downsize the data volume for reducing computational burden. The key idea of subsampling is to perform statistical analysis on a representative subsample drawn from the full data. It provides a practical solution to extracting useful information from big data. In this article, we develop an efficient subsampling method for large‐scale multiplicative regression model, which can largely reduce the computational burden due to massive data. Under some regularity conditions, we establish consistency and asymptotic normality of the subsample‐based estimator, and derive the optimal subsampling probabilities according to the L‐optimality criterion. A two‐step algorithm is developed to approximate the optimal subsampling procedure. Meanwhile, the convergence rate and asymptotic normality of the two‐step subsample estimator are established. Numerical studies and two real data applications are carried out to evaluate the performance of our subsampling method.
{"title":"Optimal subsampling for multiplicative regression with massive data","authors":"Tianzhen Wang, Haixiang Zhang","doi":"10.1111/stan.12266","DOIUrl":"https://doi.org/10.1111/stan.12266","url":null,"abstract":"Faced with massive data, subsampling is a popular way to downsize the data volume for reducing computational burden. The key idea of subsampling is to perform statistical analysis on a representative subsample drawn from the full data. It provides a practical solution to extracting useful information from big data. In this article, we develop an efficient subsampling method for large‐scale multiplicative regression model, which can largely reduce the computational burden due to massive data. Under some regularity conditions, we establish consistency and asymptotic normality of the subsample‐based estimator, and derive the optimal subsampling probabilities according to the L‐optimality criterion. A two‐step algorithm is developed to approximate the optimal subsampling procedure. Meanwhile, the convergence rate and asymptotic normality of the two‐step subsample estimator are established. Numerical studies and two real data applications are carried out to evaluate the performance of our subsampling method.","PeriodicalId":51178,"journal":{"name":"Statistica Neerlandica","volume":"11 1","pages":"418 - 449"},"PeriodicalIF":1.5,"publicationDate":"2022-03-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89804072","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this article, we consider the problem of change-point analysis for the count time series data through an integer-valued autoregressive process of order 1 (INAR(1)) with time-varying covariates. These types of features we observe in many real-life scenarios especially in the COVID-19 data sets, where the number of active cases over time starts falling and then again increases. In order to capture those features, we use Poisson INAR(1) process with a time-varying smoothing covariate. By using such model, we can model both the components in the active cases at time-point t namely, (i) number of nonrecovery cases from the previous time-point and (ii) number of new cases at time-point t. We study some theoretical properties of the proposed model along with forecasting. Some simulation studies are performed to study the effectiveness of the proposed method. Finally, we analyze two COVID-19 data sets and compare our proposed model with another PINAR(1) process which has time-varying covariate but no change-point, to demonstrate the overall performance of our proposed model.
{"title":"Change-point analysis through integer-valued autoregressive process with application to some COVID-19 data.","authors":"Subhankar Chattopadhyay, Raju Maiti, Samarjit Das, Atanu Biswas","doi":"10.1111/stan.12251","DOIUrl":"https://doi.org/10.1111/stan.12251","url":null,"abstract":"<p><p>In this article, we consider the problem of change-point analysis for the count time series data through an integer-valued autoregressive process of order 1 (INAR(1)) with time-varying covariates. These types of features we observe in many real-life scenarios especially in the COVID-19 data sets, where the number of active cases over time starts falling and then again increases. In order to capture those features, we use Poisson INAR(1) process with a time-varying smoothing covariate. By using such model, we can model both the components in the active cases at time-point <i>t</i> namely, (i) number of nonrecovery cases from the previous time-point and (ii) number of new cases at time-point <i>t</i>. We study some theoretical properties of the proposed model along with forecasting. Some simulation studies are performed to study the effectiveness of the proposed method. Finally, we analyze two COVID-19 data sets and compare our proposed model with another PINAR(1) process which has time-varying covariate but no change-point, to demonstrate the overall performance of our proposed model.</p>","PeriodicalId":51178,"journal":{"name":"Statistica Neerlandica","volume":"76 1","pages":"4-34"},"PeriodicalIF":1.5,"publicationDate":"2022-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1111/stan.12251","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"39154751","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Wei Wang, Linjiang Li, Sheng Li, F. Yin, Fang Liao, Zhang Tao, Xiaosong Li, Xiong Xiao, Yue Ma
We developed a novel method to address multicollinearity in linear models called average ordinary least squares (OLS)‐centered penalized regression (AOPR). AOPR penalizes the cost function to shrink the estimators toward the weighted‐average OLS estimator. The commonly used ridge regression (RR) shrinks the estimators toward zero, that is, employs penalization prior β∼N(0,1/k) in the Bayesian view, which contradicts the common real prior β≠0 . Therefore, RR selects small penalization coefficients to relieve such a contradiction and thus makes the penalizations inadequate. Mathematical derivations remind us that AOPR could increase the performance of RR and OLS regression. A simulation study shows that AOPR obtains more accurate estimators than OLS regression in most situations and more accurate estimators than RR when the signs of the true β s are identical and is slightly less accurate than RR when the signs of the true β s are different. Additionally, a case study shows that AOPR obtains more stable estimators and stronger statistical power and predictive ability than RR and OLS regression. Through these results, we recommend using AOPR to address multicollinearity more efficiently than RR and OLS regression, especially when the true β s have identical signs.
{"title":"Average ordinary least squares‐centered penalized regression: A more efficient way to address multicollinearity than ridge regression","authors":"Wei Wang, Linjiang Li, Sheng Li, F. Yin, Fang Liao, Zhang Tao, Xiaosong Li, Xiong Xiao, Yue Ma","doi":"10.1111/stan.12263","DOIUrl":"https://doi.org/10.1111/stan.12263","url":null,"abstract":"We developed a novel method to address multicollinearity in linear models called average ordinary least squares (OLS)‐centered penalized regression (AOPR). AOPR penalizes the cost function to shrink the estimators toward the weighted‐average OLS estimator. The commonly used ridge regression (RR) shrinks the estimators toward zero, that is, employs penalization prior β∼N(0,1/k) in the Bayesian view, which contradicts the common real prior β≠0 . Therefore, RR selects small penalization coefficients to relieve such a contradiction and thus makes the penalizations inadequate. Mathematical derivations remind us that AOPR could increase the performance of RR and OLS regression. A simulation study shows that AOPR obtains more accurate estimators than OLS regression in most situations and more accurate estimators than RR when the signs of the true β s are identical and is slightly less accurate than RR when the signs of the true β s are different. Additionally, a case study shows that AOPR obtains more stable estimators and stronger statistical power and predictive ability than RR and OLS regression. Through these results, we recommend using AOPR to address multicollinearity more efficiently than RR and OLS regression, especially when the true β s have identical signs.","PeriodicalId":51178,"journal":{"name":"Statistica Neerlandica","volume":"88 1","pages":"347 - 368"},"PeriodicalIF":1.5,"publicationDate":"2022-01-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81131506","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}