Pub Date : 2026-01-06DOI: 10.1016/j.jspi.2026.106373
Steven Abrams , Paul Janssen , Noël Veraverbeke
In medical research interest is often in studying either the association between an event time and a continuous covariate or between two event times and where event times are typically subject to (right) censoring. Although the strength of dependence between such random variables can be expressed in terms of global and local association measures, it is interesting to study alternative quantities such as percentiles of the residual lifetime distribution with regard to , conditional on taking values in a given interval. In this paper, we extend existing methods to estimate quantiles of the conditional residual lifetime distribution needed to encompass a more flexible classification of subjects into subgroups based on their respective -values. More specifically, we propose two estimators under one-component, respectively under univariate censoring, and provide a detailed study of their finite-sample performance. We demonstrate the use of these estimators for two medical datasets on (1) monoclonal gammopathy of undetermined significance, and (2) on overall mortality in Danish twin members.
{"title":"Nonparametric estimation of the quantiles of the conditional residual lifetime distribution","authors":"Steven Abrams , Paul Janssen , Noël Veraverbeke","doi":"10.1016/j.jspi.2026.106373","DOIUrl":"10.1016/j.jspi.2026.106373","url":null,"abstract":"<div><div>In medical research interest is often in studying either the association between an event time <span><math><msub><mrow><mi>T</mi></mrow><mrow><mn>1</mn></mrow></msub></math></span> and a continuous covariate <span><math><msub><mrow><mi>T</mi></mrow><mrow><mn>2</mn></mrow></msub></math></span> or between two event times <span><math><msub><mrow><mi>T</mi></mrow><mrow><mn>1</mn></mrow></msub></math></span> and <span><math><msub><mrow><mi>T</mi></mrow><mrow><mn>2</mn></mrow></msub></math></span> where event times are typically subject to (right) censoring. Although the strength of dependence between such random variables can be expressed in terms of global and local association measures, it is interesting to study alternative quantities such as percentiles of the residual lifetime distribution with regard to <span><math><msub><mrow><mi>T</mi></mrow><mrow><mn>1</mn></mrow></msub></math></span>, conditional on <span><math><msub><mrow><mi>T</mi></mrow><mrow><mn>2</mn></mrow></msub></math></span> taking values in a given interval. In this paper, we extend existing methods to estimate quantiles of the conditional residual lifetime distribution needed to encompass a more flexible classification of subjects into subgroups based on their respective <span><math><msub><mrow><mi>T</mi></mrow><mrow><mn>2</mn></mrow></msub></math></span>-values. More specifically, we propose two estimators under one-component, respectively under univariate censoring, and provide a detailed study of their finite-sample performance. We demonstrate the use of these estimators for two medical datasets on (1) monoclonal gammopathy of undetermined significance, and (2) on overall mortality in Danish twin members.</div></div>","PeriodicalId":50039,"journal":{"name":"Journal of Statistical Planning and Inference","volume":"243 ","pages":"Article 106373"},"PeriodicalIF":0.8,"publicationDate":"2026-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145925000","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-06DOI: 10.1016/j.jspi.2026.106374
Faouzi Hakimi
Latin Hypercube Sampling (LHS) is a widely used stratified sampling method in computer experiments. In this work, we extend existing convergence results for the sample mean under LHS to the broader class of -estimators — estimators defined as the zeros of a sample mean function. We derive the asymptotic variance of these estimators and demonstrate that it is smaller when using LHS compared to traditional independent and identically distributed sampling. Furthermore, we establish a Central Limit Theorem for -estimators under LHS, providing a theoretical foundation for its improved efficiency.
{"title":"Robust estimation with Latin Hypercube Sampling: A Central Limit Theorem for Z-estimators","authors":"Faouzi Hakimi","doi":"10.1016/j.jspi.2026.106374","DOIUrl":"10.1016/j.jspi.2026.106374","url":null,"abstract":"<div><div>Latin Hypercube Sampling (LHS) is a widely used stratified sampling method in computer experiments. In this work, we extend existing convergence results for the sample mean under LHS to the broader class of <span><math><mi>Z</mi></math></span>-estimators — estimators defined as the zeros of a sample mean function. We derive the asymptotic variance of these estimators and demonstrate that it is smaller when using LHS compared to traditional independent and identically distributed sampling. Furthermore, we establish a Central Limit Theorem for <span><math><mi>Z</mi></math></span>-estimators under LHS, providing a theoretical foundation for its improved efficiency.</div></div>","PeriodicalId":50039,"journal":{"name":"Journal of Statistical Planning and Inference","volume":"243 ","pages":"Article 106374"},"PeriodicalIF":0.8,"publicationDate":"2026-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145925203","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-19DOI: 10.1016/j.jspi.2025.106372
Zuohang Kang , Zujun Ou
With the increasing complexity of experimental scenarios, mixed-level designs with large size are urgently needed. A class of mixed-level designs are constructed through amplification, which enlarges both the run size and number of factors of initial design. The space-filling properties of amplified designs are discussed under generalized minimum aberration criterion, wordlength enumerator and maximin -distance criterion, and attainable upper bound of maximin -distance and lower bound of wordlength enumerator for amplified design are respectively obtained. Numerical examples demonstrate that the construction method of amplified designs is very simple and effective, and is recommended for application in high dimension topics of statistics or large-scale experiments.
{"title":"A class of mixed-level amplified designs and their space-filling properties","authors":"Zuohang Kang , Zujun Ou","doi":"10.1016/j.jspi.2025.106372","DOIUrl":"10.1016/j.jspi.2025.106372","url":null,"abstract":"<div><div>With the increasing complexity of experimental scenarios, mixed-level designs with large size are urgently needed. A class of mixed-level designs are constructed through amplification, which enlarges both the run size and number of factors of initial design. The space-filling properties of amplified designs are discussed under generalized minimum aberration criterion, wordlength enumerator and maximin <span><math><msub><mrow><mi>L</mi></mrow><mrow><mn>2</mn></mrow></msub></math></span>-distance criterion, and attainable upper bound of maximin <span><math><msub><mrow><mi>L</mi></mrow><mrow><mn>2</mn></mrow></msub></math></span>-distance and lower bound of wordlength enumerator for amplified design are respectively obtained. Numerical examples demonstrate that the construction method of amplified designs is very simple and effective, and is recommended for application in high dimension topics of statistics or large-scale experiments.</div></div>","PeriodicalId":50039,"journal":{"name":"Journal of Statistical Planning and Inference","volume":"243 ","pages":"Article 106372"},"PeriodicalIF":0.8,"publicationDate":"2025-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145840080","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-27DOI: 10.1016/j.jspi.2025.106369
Guanfu Liu , Yuejiao Fu
The finite mixtures of multivariate Poisson (FMMP) distributions have wide applications in the real world. Testing for homogeneity under the FMMP models is important, however, there is no generic solution to this problem as far as we know. In this paper, we propose an EM-test for homogeneity under the FMMP models to fulfill the gap. We establish the strong consistency of the maximum likelihood estimator for the mixing distribution by relaxing two conditions required in existing literature. The null limiting distribution of the proposed test is studied, and based on the limiting distribution, a resampling procedure is constructed to approximate the -value of the test. The loss of the strong identifiability for the multivariate Poisson distribution poses a significant challenge in deriving the null limiting distribution. Finally, simulation studies and real-data analysis demonstrate the good performance of the proposed test.
{"title":"Homogeneity testing under finite mixtures of multivariate Poisson distributions","authors":"Guanfu Liu , Yuejiao Fu","doi":"10.1016/j.jspi.2025.106369","DOIUrl":"10.1016/j.jspi.2025.106369","url":null,"abstract":"<div><div>The finite mixtures of multivariate Poisson (FMMP) distributions have wide applications in the real world. Testing for homogeneity under the FMMP models is important, however, there is no generic solution to this problem as far as we know. In this paper, we propose an EM-test for homogeneity under the FMMP models to fulfill the gap. We establish the strong consistency of the maximum likelihood estimator for the mixing distribution by relaxing two conditions required in existing literature. The null limiting distribution of the proposed test is studied, and based on the limiting distribution, a resampling procedure is constructed to approximate the <span><math><mi>p</mi></math></span>-value of the test. The loss of the strong identifiability for the multivariate Poisson distribution poses a significant challenge in deriving the null limiting distribution. Finally, simulation studies and real-data analysis demonstrate the good performance of the proposed test.</div></div>","PeriodicalId":50039,"journal":{"name":"Journal of Statistical Planning and Inference","volume":"243 ","pages":"Article 106369"},"PeriodicalIF":0.8,"publicationDate":"2025-11-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145610584","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-26DOI: 10.1016/j.jspi.2025.106368
Sadegh Chegini, Mahmoud Zarepour
The Liouville distribution, a generalization of the Dirichlet distribution, serves as a well-known conjugate prior for the multinomial distribution. Just as the Dirichlet process is derived from the finite-dimensional Dirichlet distribution, it is natural and important to introduce and derive a Liouville process in a similar manner. We introduce a discrete random probability measure constructed from a random vector following a Liouville distribution and subsequently derive its weak limit to define our proposed Liouville process. The resulting process is a spike-and-slab process, where the Dirichlet process serves as the slab and a single point from its mean acts as the spike. These two components are linearly combined using a random weight generated from the Liouville distribution. By using the Liouville process as a prior on the space of probability measures, we derive the corresponding posterior process as well as the predictive distribution.
{"title":"On deriving Liouville process from Liouville distribution and its application in nonparametric Bayesian inference","authors":"Sadegh Chegini, Mahmoud Zarepour","doi":"10.1016/j.jspi.2025.106368","DOIUrl":"10.1016/j.jspi.2025.106368","url":null,"abstract":"<div><div>The Liouville distribution, a generalization of the Dirichlet distribution, serves as a well-known conjugate prior for the multinomial distribution. Just as the Dirichlet process is derived from the finite-dimensional Dirichlet distribution, it is natural and important to introduce and derive a Liouville process in a similar manner. We introduce a discrete random probability measure constructed from a random vector following a Liouville distribution and subsequently derive its weak limit to define our proposed Liouville process. The resulting process is a spike-and-slab process, where the Dirichlet process serves as the slab and a single point from its mean acts as the spike. These two components are linearly combined using a random weight generated from the Liouville distribution. By using the Liouville process as a prior on the space of probability measures, we derive the corresponding posterior process as well as the predictive distribution.</div></div>","PeriodicalId":50039,"journal":{"name":"Journal of Statistical Planning and Inference","volume":"243 ","pages":"Article 106368"},"PeriodicalIF":0.8,"publicationDate":"2025-11-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145618331","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-12DOI: 10.1016/j.jspi.2025.106361
Weiwei Zhuang , Weiqi Yang , Wenchen Liao , Yukun Liu
Lorenz dominance is a fundamental tool for assessing whether wealth or income disparity is greater in one population than another. Based on the well-established density ratio model, we propose a new semiparametric test for Lorenz dominance. We show that the limiting distribution of the proposed test statistic is the supremum of a Gaussian process. To facilitate practical application, we devise a bootstrap procedure to calculate the -value and establish its theoretical validity. Our simulation studies demonstrate that the proposed test correctly controls the Type I error and outperforms its competitors in terms of statistical power. Finally, we apply the test to compare salary distributions among higher education employees in Ohio from 2011 to 2015.
{"title":"Semiparametric tests for Lorenz dominance based on density ratio model","authors":"Weiwei Zhuang , Weiqi Yang , Wenchen Liao , Yukun Liu","doi":"10.1016/j.jspi.2025.106361","DOIUrl":"10.1016/j.jspi.2025.106361","url":null,"abstract":"<div><div>Lorenz dominance is a fundamental tool for assessing whether wealth or income disparity is greater in one population than another. Based on the well-established density ratio model, we propose a new semiparametric test for Lorenz dominance. We show that the limiting distribution of the proposed test statistic is the supremum of a Gaussian process. To facilitate practical application, we devise a bootstrap procedure to calculate the <span><math><mi>p</mi></math></span>-value and establish its theoretical validity. Our simulation studies demonstrate that the proposed test correctly controls the Type I error and outperforms its competitors in terms of statistical power. Finally, we apply the test to compare salary distributions among higher education employees in Ohio from 2011 to 2015.</div></div>","PeriodicalId":50039,"journal":{"name":"Journal of Statistical Planning and Inference","volume":"242 ","pages":"Article 106361"},"PeriodicalIF":0.8,"publicationDate":"2025-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145527966","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-08DOI: 10.1016/j.jspi.2025.106360
Yuze Yuan , Shuyu Liu , Rongmao Zhang
Zhang and Chan (2021) considered the augmented Dickey–Fuller (ADF) test for an unit root process with linear noise driven by generalized autoregressive conditional heteroskedasticity (GARCH), and showed that the ADF test may perform even worse than the Dickey–Fuller test. The main reason is that the parameters of the lag terms in the ADF regression cannot be estimated consistently for infinite variance GARCH noises based on least square estimation (LSE). In this paper, we propose a self-weighted least square estimation (SWLSE) procedure to solve this problem. Consequently, a new test based on SWLSE for the unit-root is also proposed. It is shown that the SWLSE are consistent, and the proposed test converges to a functional of a stable process and a Brownian motion and performs well in term of size and power. Simulation study is conducted to evaluate the performance of our procedure, and a real-world illustrative example is provided.
{"title":"Self-weighted estimation for nonstationary processes with infinite variance GARCH errors","authors":"Yuze Yuan , Shuyu Liu , Rongmao Zhang","doi":"10.1016/j.jspi.2025.106360","DOIUrl":"10.1016/j.jspi.2025.106360","url":null,"abstract":"<div><div>Zhang and Chan (2021) considered the augmented Dickey–Fuller (ADF) test for an unit root process with linear noise driven by generalized autoregressive conditional heteroskedasticity (GARCH), and showed that the ADF test may perform even worse than the Dickey–Fuller test. The main reason is that the parameters of the lag terms in the ADF regression cannot be estimated consistently for infinite variance GARCH noises based on least square estimation (LSE). In this paper, we propose a self-weighted least square estimation (SWLSE) procedure to solve this problem. Consequently, a new test based on SWLSE for the unit-root is also proposed. It is shown that the SWLSE are consistent, and the proposed test converges to a functional of a stable process and a Brownian motion and performs well in term of size and power. Simulation study is conducted to evaluate the performance of our procedure, and a real-world illustrative example is provided.</div></div>","PeriodicalId":50039,"journal":{"name":"Journal of Statistical Planning and Inference","volume":"242 ","pages":"Article 106360"},"PeriodicalIF":0.8,"publicationDate":"2025-11-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145527965","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-01DOI: 10.1016/j.jspi.2025.106359
Yu Shi , Grace Y. Yi
Graphical models are powerful tools for characterizing conditional dependence structures among variables with complex relationships. Although many methods have been developed under the graphical modeling framework, their validity often hinges on the quality of the data. A fundamental assumption in most existing approaches is that all variables are measured precisely, an assumption frequently violated in practice. In many applications, mismeasurement of mixed discrete and continuous variables is a common challenge. In this paper, we address error-contaminated data involving both continuous and discrete variables by proposing a mixed latent Gaussian copula graphical measurement error model. To perform inference, we develop a simulation-based expectation–maximization procedure that explicitly accounts for mismeasurement effects. We further introduce a computationally efficient refinement to reduce the computational burden. Asymptotic properties of the proposed estimator are established, and its finite-sample performance is evaluated through numerical studies.
{"title":"Mixed latent graphical models with mixed measurement error and misclassification in variables","authors":"Yu Shi , Grace Y. Yi","doi":"10.1016/j.jspi.2025.106359","DOIUrl":"10.1016/j.jspi.2025.106359","url":null,"abstract":"<div><div>Graphical models are powerful tools for characterizing conditional dependence structures among variables with complex relationships. Although many methods have been developed under the graphical modeling framework, their validity often hinges on the quality of the data. A fundamental assumption in most existing approaches is that all variables are measured precisely, an assumption frequently violated in practice. In many applications, mismeasurement of mixed discrete and continuous variables is a common challenge. In this paper, we address error-contaminated data involving both continuous and discrete variables by proposing a mixed latent Gaussian copula graphical measurement error model. To perform inference, we develop a simulation-based expectation–maximization procedure that explicitly accounts for mismeasurement effects. We further introduce a computationally efficient refinement to reduce the computational burden. Asymptotic properties of the proposed estimator are established, and its finite-sample performance is evaluated through numerical studies.</div></div>","PeriodicalId":50039,"journal":{"name":"Journal of Statistical Planning and Inference","volume":"242 ","pages":"Article 106359"},"PeriodicalIF":0.8,"publicationDate":"2025-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145465635","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-10-30DOI: 10.1016/j.jspi.2025.106357
Yuliang Zhou, Qianqian Zhao, Shengli Zhao
Sliced designs are widely used in multi-platform experiments. A sliced design contains several sub-designs divided by the sliced factor, and each sub-design is assigned to a platform, respectively. In some experimental scenarios, it is necessary to consider the optimality of both the sub-designs and the complete sliced designs, such sliced designs are referred to as general sliced (GS) designs. To construct the optimal GS designs for such scenarios, we propose the general sliced effect hierarchy principle (GSEHP). Based on the GSEHP, we introduce the general sliced minimum aberration (GSMA) criterion and choose the GSMA designs as optimal GS designs when the sliced factor and design factors are equally important. Some GSMA designs with 32 and 64 runs are tabulated. Additionally, we present a practical example to illustrate the application of GSMA designs in guiding strategies of webpage setting on two platforms.
{"title":"General sliced minimum aberration designs for multi-platform experiments","authors":"Yuliang Zhou, Qianqian Zhao, Shengli Zhao","doi":"10.1016/j.jspi.2025.106357","DOIUrl":"10.1016/j.jspi.2025.106357","url":null,"abstract":"<div><div>Sliced designs are widely used in multi-platform experiments. A sliced design contains several sub-designs divided by the sliced factor, and each sub-design is assigned to a platform, respectively. In some experimental scenarios, it is necessary to consider the optimality of both the sub-designs and the complete sliced designs, such sliced designs are referred to as general sliced (GS) designs. To construct the optimal GS designs for such scenarios, we propose the general sliced effect hierarchy principle (GSEHP). Based on the GSEHP, we introduce the general sliced minimum aberration (GSMA) criterion and choose the GSMA designs as optimal GS designs when the sliced factor and design factors are equally important. Some GSMA designs with 32 and 64 runs are tabulated. Additionally, we present a practical example to illustrate the application of GSMA designs in guiding strategies of webpage setting on two platforms.</div></div>","PeriodicalId":50039,"journal":{"name":"Journal of Statistical Planning and Inference","volume":"242 ","pages":"Article 106357"},"PeriodicalIF":0.8,"publicationDate":"2025-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145415646","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-10-28DOI: 10.1016/j.jspi.2025.106358
Sumito Kurata, Kei Hirose
Most of the regularization methods such as the LASSO have one (or more) regularization parameter(s), and to select the value of the regularization parameter is essentially equal to select a model. Thus, to obtain a model suitable for the data and phenomenon, we need to determine an adequate value of the regularization parameter. Regarding the determination of the regularization parameter in the linear regression model, we often apply the information criteria like the AIC and BIC, however, it has been pointed out that these criteria are sensitive to outliers and tend not to perform well in high-dimensional settings. Outliers generally have a negative effect on not only estimation but also model selection, consequently, it is important to employ a selection method with robustness against outliers. In addition, when the number of explanatory variables is quite large, most conventional criteria are prone to select unnecessary explanatory variables. In this paper, we propose model evaluation criteria based on the statistical divergence with excellence in robustness in both of parametric estimation and model selection, by applying the quasi-Bayesian procedure. Our proposed criteria achieve the selection consistency even in high-dimensional settings due to precise approximation, simultaneously with robustness. We also investigate the conditions for establishing robustness and consistency, and provide an appropriate example of the divergence and penalty term that can achieve the desirable properties. We finally report the results of some numerical examples to verify that the proposed criteria perform robust and consistent variable selection compared with the conventional selection methods.
{"title":"Robust and consistent model evaluation criteria in high-dimensional regression","authors":"Sumito Kurata, Kei Hirose","doi":"10.1016/j.jspi.2025.106358","DOIUrl":"10.1016/j.jspi.2025.106358","url":null,"abstract":"<div><div>Most of the regularization methods such as the LASSO have one (or more) regularization parameter(s), and to select the value of the regularization parameter is essentially equal to select a model. Thus, to obtain a model suitable for the data and phenomenon, we need to determine an adequate value of the regularization parameter. Regarding the determination of the regularization parameter in the linear regression model, we often apply the information criteria like the AIC and BIC, however, it has been pointed out that these criteria are sensitive to outliers and tend not to perform well in high-dimensional settings. Outliers generally have a negative effect on not only estimation but also model selection, consequently, it is important to employ a selection method with robustness against outliers. In addition, when the number of explanatory variables is quite large, most conventional criteria are prone to select unnecessary explanatory variables. In this paper, we propose model evaluation criteria based on the statistical divergence with excellence in robustness in both of parametric estimation and model selection, by applying the quasi-Bayesian procedure. Our proposed criteria achieve the selection consistency even in high-dimensional settings due to precise approximation, simultaneously with robustness. We also investigate the conditions for establishing robustness and consistency, and provide an appropriate example of the divergence and penalty term that can achieve the desirable properties. We finally report the results of some numerical examples to verify that the proposed criteria perform robust and consistent variable selection compared with the conventional selection methods.</div></div>","PeriodicalId":50039,"journal":{"name":"Journal of Statistical Planning and Inference","volume":"242 ","pages":"Article 106358"},"PeriodicalIF":0.8,"publicationDate":"2025-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145415645","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}