Pub Date : 2022-07-26DOI: 10.1007/s10463-022-00841-7
Min Tsao
Traditionally, the main focus of the least squares regression is to study the effects of individual predictor variables, but strongly correlated variables generate multicollinearity which makes it difficult to study their effects. To resolve the multicollinearity issue without abandoning the least squares regression, for situations where predictor variables are in groups with strong within-group correlations but weak between-group correlations, we propose to study the effects of the groups with a group approach to the least squares regression. Using an all positive correlations arrangement of the strongly correlated variables, we first characterize group effects that are meaningful and can be accurately estimated. We then discuss the group approach to the least squares regression through a simulation study and demonstrate that it is an effective method for handling multicollinearity. We also address a common misconception about prediction accuracy of the least squares estimated model.
{"title":"Group least squares regression for linear models with strongly correlated predictor variables","authors":"Min Tsao","doi":"10.1007/s10463-022-00841-7","DOIUrl":"10.1007/s10463-022-00841-7","url":null,"abstract":"<div><p>Traditionally, the main focus of the least squares regression is to study the effects of individual predictor variables, but strongly correlated variables generate multicollinearity which makes it difficult to study their effects. To resolve the multicollinearity issue without abandoning the least squares regression, for situations where predictor variables are in groups with strong within-group correlations but weak between-group correlations, we propose to study the effects of the groups with a group approach to the least squares regression. Using an all positive correlations arrangement of the strongly correlated variables, we first characterize group effects that are meaningful and can be accurately estimated. We then discuss the group approach to the least squares regression through a simulation study and demonstrate that it is an effective method for handling multicollinearity. We also address a common misconception about prediction accuracy of the least squares estimated model.</p></div>","PeriodicalId":55511,"journal":{"name":"Annals of the Institute of Statistical Mathematics","volume":null,"pages":null},"PeriodicalIF":1.0,"publicationDate":"2022-07-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46133684","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-07-15DOI: 10.1007/s10463-022-00840-8
Suneel Babu Chatla
We investigate hypothesis testing in nonparametric additive models estimated using simplified smooth backfitting (Huang and Yu, Journal of Computational and Graphical Statistics, 28(2), 386–400, 2019). Simplified smooth backfitting achieves oracle properties under regularity conditions and provides closed-form expressions of the estimators that are useful for deriving asymptotic properties. We develop a generalized likelihood ratio (GLR) (Fan, Zhang and Zhang, Annals of statistics, 29(1),153–193, 2001) and a loss function (LF) (Hong and Lee, Annals of Statistics, 41(3), 1166–1203, 2013)-based testing framework for inference. Under the null hypothesis, both the GLR and LF tests have asymptotically rescaled chi-squared distributions, and both exhibit the Wilks phenomenon, which means the scaling constants and degrees of freedom are independent of nuisance parameters. These tests are asymptotically optimal in terms of rates of convergence for nonparametric hypothesis testing. Additionally, the bandwidths that are well suited for model estimation may be useful for testing. We show that in additive models, the LF test is asymptotically more powerful than the GLR test. We use simulations to demonstrate the Wilks phenomenon and the power of these proposed GLR and LF tests, and a real example to illustrate their usefulness.
我们研究了使用简化光滑反拟合估计的非参数加性模型的假设检验(Huang and Yu,计算与图形统计学报,28(2),386 - 400,2019)。简化的光滑反拟合在正则条件下获得了oracle性质,并提供了对渐近性质的推导有用的估计量的封闭形式表达式。我们开发了一个基于广义似然比(GLR) (Fan, Zhang and Zhang, Annals of statistics, 29(1),153 - 193,2001)和一个基于损失函数(LF) (Hong and Lee, Annals of statistics, 41(3), 1166 - 1203,2013)的推理测试框架。在零假设下,GLR和LF检验都具有渐近重标化的卡方分布,并且都表现出威尔克斯现象,这意味着标化常数和自由度与干扰参数无关。就非参数假设检验的收敛率而言,这些检验是渐近最优的。此外,非常适合模型估计的带宽可能对测试有用。我们证明了在加性模型中,LF检验比GLR检验渐近地更有效。我们使用模拟来证明威尔克斯现象和这些提议的GLR和LF测试的力量,并通过一个真实的例子来说明它们的实用性。
{"title":"Nonparametric inference for additive models estimated via simplified smooth backfitting","authors":"Suneel Babu Chatla","doi":"10.1007/s10463-022-00840-8","DOIUrl":"10.1007/s10463-022-00840-8","url":null,"abstract":"<div><p>We investigate hypothesis testing in nonparametric additive models estimated using simplified smooth backfitting (Huang and Yu, Journal of Computational and Graphical Statistics, 28(2), 386–400, 2019). Simplified smooth backfitting achieves oracle properties under regularity conditions and provides closed-form expressions of the estimators that are useful for deriving asymptotic properties. We develop a generalized likelihood ratio (GLR) (Fan, Zhang and Zhang, Annals of statistics, 29(1),153–193, 2001) and a loss function (LF) (Hong and Lee, Annals of Statistics, 41(3), 1166–1203, 2013)-based testing framework for inference. Under the null hypothesis, both the GLR and LF tests have asymptotically rescaled chi-squared distributions, and both exhibit the Wilks phenomenon, which means the scaling constants and degrees of freedom are independent of nuisance parameters. These tests are asymptotically optimal in terms of rates of convergence for nonparametric hypothesis testing. Additionally, the bandwidths that are well suited for model estimation may be useful for testing. We show that in additive models, the LF test is asymptotically more powerful than the GLR test. We use simulations to demonstrate the Wilks phenomenon and the power of these proposed GLR and LF tests, and a real example to illustrate their usefulness.</p></div>","PeriodicalId":55511,"journal":{"name":"Annals of the Institute of Statistical Mathematics","volume":null,"pages":null},"PeriodicalIF":1.0,"publicationDate":"2022-07-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45543788","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-06-28DOI: 10.1007/s10463-022-00837-3
Vo Nguyen Le Duy, Ichiro Takeuchi
In this paper, we study statistical inference for the Wasserstein distance, which has attracted much attention and has been applied to various machine learning tasks. Several studies have been proposed in the literature, but almost all of them are based on asymptotic approximation and do not have finite-sample validity. In this study, we propose an exact (non-asymptotic) inference method for the Wasserstein distance inspired by the concept of conditional selective inference (SI). To our knowledge, this is the first method that can provide a valid confidence interval (CI) for the Wasserstein distance with finite-sample coverage guarantee, which can be applied not only to one-dimensional problems but also to multi-dimensional problems. We evaluate the performance of the proposed method on both synthetic and real-world datasets.
{"title":"Exact statistical inference for the Wasserstein distance by selective inference","authors":"Vo Nguyen Le Duy, Ichiro Takeuchi","doi":"10.1007/s10463-022-00837-3","DOIUrl":"10.1007/s10463-022-00837-3","url":null,"abstract":"<div><p>In this paper, we study statistical inference for the Wasserstein distance, which has attracted much attention and has been applied to various machine learning tasks. Several studies have been proposed in the literature, but almost all of them are based on <i>asymptotic</i> approximation and do <i>not</i> have finite-sample validity. In this study, we propose an <i>exact (non-asymptotic)</i> inference method for the Wasserstein distance inspired by the concept of conditional selective inference (SI). To our knowledge, this is the first method that can provide a valid confidence interval (CI) for the Wasserstein distance with finite-sample coverage guarantee, which can be applied not only to one-dimensional problems but also to multi-dimensional problems. We evaluate the performance of the proposed method on both synthetic and real-world datasets.</p></div>","PeriodicalId":55511,"journal":{"name":"Annals of the Institute of Statistical Mathematics","volume":null,"pages":null},"PeriodicalIF":1.0,"publicationDate":"2022-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43291261","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-06-28DOI: 10.1007/s10463-022-00839-1
Yuri Goegebeur, Armelle Guillou, Jing Qin
We propose a robust estimator of the stable tail dependence function in the case where random covariates are recorded. Under suitable assumptions, we derive the finite-dimensional weak convergence of the estimator properly normalized. The performance of our estimator in terms of efficiency and robustness is illustrated through a simulation study. Our methodology is applied on a real dataset of sale prices of residential properties.
{"title":"Robust estimation of the conditional stable tail dependence function","authors":"Yuri Goegebeur, Armelle Guillou, Jing Qin","doi":"10.1007/s10463-022-00839-1","DOIUrl":"10.1007/s10463-022-00839-1","url":null,"abstract":"<div><p>We propose a robust estimator of the stable tail dependence function in the case where random covariates are recorded. Under suitable assumptions, we derive the finite-dimensional weak convergence of the estimator properly normalized. The performance of our estimator in terms of efficiency and robustness is illustrated through a simulation study. Our methodology is applied on a real dataset of sale prices of residential properties.</p></div>","PeriodicalId":55511,"journal":{"name":"Annals of the Institute of Statistical Mathematics","volume":null,"pages":null},"PeriodicalIF":1.0,"publicationDate":"2022-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10463-022-00839-1.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48094376","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-06-10DOI: 10.1007/s10463-022-00836-4
Lyu Ni, Jun Shao
To estimate unknown population parameters based on ({varvec{y}}), a vector of multivariate outcomes having nonignorable item nonresponse that directly depends on ({varvec{y}}), we propose an innovative inverse propensity weighting approach when the joint distribution of ({varvec{y}}) and associated covariate ({varvec{x}}) is nonparametric and the nonresponse probability conditional on ({varvec{y}}) and ({varvec{x}}) has a parametric form. To deal with the identifiability issue, we utilize a nonresponse instrument ({varvec{z}}), an auxiliary variable related to ({varvec{y}}) but not related to the nonresponse probability conditional on ({varvec{y}}) and ({varvec{x}}). We utilize a modified generalized method of moments to obtain estimators of the parameters in the nonresponse probability. Simulation results are presented and an application is illustrated in a real data set.
{"title":"Estimation with multivariate outcomes having nonignorable item nonresponse","authors":"Lyu Ni, Jun Shao","doi":"10.1007/s10463-022-00836-4","DOIUrl":"10.1007/s10463-022-00836-4","url":null,"abstract":"<div><p>To estimate unknown population parameters based on <span>({varvec{y}})</span>, a vector of multivariate outcomes having nonignorable item nonresponse that directly depends on <span>({varvec{y}})</span>, we propose an innovative inverse propensity weighting approach when the joint distribution of <span>({varvec{y}})</span> and associated covariate <span>({varvec{x}})</span> is nonparametric and the nonresponse probability conditional on <span>({varvec{y}})</span> and <span>({varvec{x}})</span> has a parametric form. To deal with the identifiability issue, we utilize a nonresponse instrument <span>({varvec{z}})</span>, an auxiliary variable related to <span>({varvec{y}})</span> but not related to the nonresponse probability conditional on <span>({varvec{y}})</span> and <span>({varvec{x}})</span>. We utilize a modified generalized method of moments to obtain estimators of the parameters in the nonresponse probability. Simulation results are presented and an application is illustrated in a real data set.</p></div>","PeriodicalId":55511,"journal":{"name":"Annals of the Institute of Statistical Mathematics","volume":null,"pages":null},"PeriodicalIF":1.0,"publicationDate":"2022-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46136467","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-05-28DOI: 10.1007/s10463-022-00829-3
Masataka Taguri
{"title":"Discussion of “Akaike Memorial Lecture 2020: Some of the challenges of statistical applications”","authors":"Masataka Taguri","doi":"10.1007/s10463-022-00829-3","DOIUrl":"10.1007/s10463-022-00829-3","url":null,"abstract":"","PeriodicalId":55511,"journal":{"name":"Annals of the Institute of Statistical Mathematics","volume":null,"pages":null},"PeriodicalIF":1.0,"publicationDate":"2022-05-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44753283","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-05-26DOI: 10.1007/s10463-022-00833-7
Masayuki Henmi
{"title":"Discussion of Akaike Memorial Lecture 2020: Some of the challenges of statistical applications","authors":"Masayuki Henmi","doi":"10.1007/s10463-022-00833-7","DOIUrl":"10.1007/s10463-022-00833-7","url":null,"abstract":"","PeriodicalId":55511,"journal":{"name":"Annals of the Institute of Statistical Mathematics","volume":null,"pages":null},"PeriodicalIF":1.0,"publicationDate":"2022-05-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10463-022-00833-7.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47495607","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-05-25DOI: 10.1007/s10463-022-00831-9
John Copas
There has always been a close link between statistical applications and the development of new statistical theory and methods. Even straightforward applications of standard methods can give rise to theoretical challenges leading to new statistical ideas. In my lecture, I will briefly review a few of the statistical developments in my own published papers and describe the applications which gave rise to them. I will then outline some current work on publication bias, one of the outstanding problems in the interpretation of literature reviews, particularly in the medical sciences.
{"title":"Akaike Memorial Lecture 2020: Some of the challenges of statistical applications","authors":"John Copas","doi":"10.1007/s10463-022-00831-9","DOIUrl":"10.1007/s10463-022-00831-9","url":null,"abstract":"<div><p>There has always been a close link between statistical applications and the development of new statistical theory and methods. Even straightforward applications of standard methods can give rise to theoretical challenges leading to new statistical ideas. In my lecture, I will briefly review a few of the statistical developments in my own published papers and describe the applications which gave rise to them. I will then outline some current work on publication bias, one of the outstanding problems in the interpretation of literature reviews, particularly in the medical sciences.</p></div>","PeriodicalId":55511,"journal":{"name":"Annals of the Institute of Statistical Mathematics","volume":null,"pages":null},"PeriodicalIF":1.0,"publicationDate":"2022-05-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43359717","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-05-24DOI: 10.1007/s10463-022-00835-5
Jingjing Wu, Tasnima Abedin, Qiang Zhao
In this work, we studied a two-component mixture model with stochastic dominance constraint, a model arising naturally from many genetic studies. To model the stochastic dominance, we proposed a semiparametric modelling of the log of density ratio. More specifically, when the log of the ratio of two component densities is in a linear regression form, the stochastic dominance is immediately satisfied. For the resulting semiparametric mixture model, we proposed two estimators, maximum empirical likelihood estimator (MELE) and minimum Hellinger distance estimator (MHDE), and investigated their asymptotic properties such as consistency and normality. In addition, to test the validity of the proposed semiparametric model, we developed Kolmogorov–Smirnov type tests based on the two estimators. The finite-sample performance, in terms of both efficiency and robustness, of the two estimators and the tests were examined and compared via both thorough Monte Carlo simulation studies and real data analysis.
{"title":"Semiparametric modelling of two-component mixtures with stochastic dominance","authors":"Jingjing Wu, Tasnima Abedin, Qiang Zhao","doi":"10.1007/s10463-022-00835-5","DOIUrl":"10.1007/s10463-022-00835-5","url":null,"abstract":"<div><p>In this work, we studied a two-component mixture model with stochastic dominance constraint, a model arising naturally from many genetic studies. To model the stochastic dominance, we proposed a semiparametric modelling of the log of density ratio. More specifically, when the log of the ratio of two component densities is in a linear regression form, the stochastic dominance is immediately satisfied. For the resulting semiparametric mixture model, we proposed two estimators, maximum empirical likelihood estimator (MELE) and minimum Hellinger distance estimator (MHDE), and investigated their asymptotic properties such as consistency and normality. In addition, to test the validity of the proposed semiparametric model, we developed Kolmogorov–Smirnov type tests based on the two estimators. The finite-sample performance, in terms of both efficiency and robustness, of the two estimators and the tests were examined and compared via both thorough Monte Carlo simulation studies and real data analysis.</p></div>","PeriodicalId":55511,"journal":{"name":"Annals of the Institute of Statistical Mathematics","volume":null,"pages":null},"PeriodicalIF":1.0,"publicationDate":"2022-05-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10463-022-00835-5.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10848296","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-05-21DOI: 10.1007/s10463-022-00832-8
Jinhui Guo, Yingyin Lu
For Gaussian stationary triangular arrays, it is well known that the extreme values may occur in clusters. Here we consider the joint behaviors of the point processes of clusters and the partial sums of bivariate stationary Gaussian triangular arrays. For a bivariate stationary Gaussian triangular array, we derive the asymptotic joint behavior of the point processes of clusters and prove that the point processes and partial sums are asymptotically independent. As an immediate consequence of the results, one may obtain the asymptotic joint distributions of the extremes and partial sums. We illustrate the theoretical findings with a numeric example.
{"title":"Joint behavior of point processes of clusters and partial sums for stationary bivariate Gaussian triangular arrays","authors":"Jinhui Guo, Yingyin Lu","doi":"10.1007/s10463-022-00832-8","DOIUrl":"10.1007/s10463-022-00832-8","url":null,"abstract":"<div><p>For Gaussian stationary triangular arrays, it is well known that the extreme values may occur in clusters. Here we consider the joint behaviors of the point processes of clusters and the partial sums of bivariate stationary Gaussian triangular arrays. For a bivariate stationary Gaussian triangular array, we derive the asymptotic joint behavior of the point processes of clusters and prove that the point processes and partial sums are asymptotically independent. As an immediate consequence of the results, one may obtain the asymptotic joint distributions of the extremes and partial sums. We illustrate the theoretical findings with a numeric example.</p></div>","PeriodicalId":55511,"journal":{"name":"Annals of the Institute of Statistical Mathematics","volume":null,"pages":null},"PeriodicalIF":1.0,"publicationDate":"2022-05-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10463-022-00832-8.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48137177","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}