This paper is concerned with deriving a new test on a covariance matrix which is based on its nonlinear shrinkage estimator. The distribution of the test statistic is deduced under the null hypothesis in the large‐dimensional setting, that is, when with variables and samples both tending to infinity. The theoretical results are illustrated by means of an extensive simulation study where the new nonlinear shrinkage‐based test is compared with existing approaches, in particular with the commonly used corrected likelihood ratio test, the corrected John test, and the test based on the linear shrinkage approach. It is demonstrated that the new nonlinear shrinkage test possesses better power properties under heteroscedastic alternative.
{"title":"Nonlinear shrinkage test on a large‐dimensional covariance matrix","authors":"Taras Bodnar, Nestor Parolya, Frederik Veldman","doi":"10.1111/stan.12348","DOIUrl":"https://doi.org/10.1111/stan.12348","url":null,"abstract":"This paper is concerned with deriving a new test on a covariance matrix which is based on its nonlinear shrinkage estimator. The distribution of the test statistic is deduced under the null hypothesis in the large‐dimensional setting, that is, when with variables and samples both tending to infinity. The theoretical results are illustrated by means of an extensive simulation study where the new nonlinear shrinkage‐based test is compared with existing approaches, in particular with the commonly used corrected likelihood ratio test, the corrected John test, and the test based on the linear shrinkage approach. It is demonstrated that the new nonlinear shrinkage test possesses better power properties under heteroscedastic alternative.","PeriodicalId":51178,"journal":{"name":"Statistica Neerlandica","volume":null,"pages":null},"PeriodicalIF":1.5,"publicationDate":"2024-07-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141740615","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
U‐statistics constitute a large class of estimators, generalizing the empirical mean of a random variable to sums over every ‐tuple of distinct observations of . They may be used to estimate a regular functional of the law of . When a vector of covariates is available, a conditional U‐statistic describes the effect of on the conditional law of given , by estimating a regular conditional functional . We state nonasymptotic bounds of general conditional U‐statistics and study their asymptotics too. Assuming a parametric model of the conditional functional of interest, we propose a regression‐type estimator based on conditional U‐statistics. Its theoretical properties are derived, first in a nonasymptotic framework and then in two different asymptotic regimes. Some examples are given to illustrate our methods.
U 统计量构成了一大类估计量,它将随机变量的经验平均值概括为每一元组不同观测值的总和。 它们可用于估计"...... "规律的正则函数。 当有共变量向量时,条件 U 统计量通过估计正则条件函数,描述了"...... "对给定"...... "的条件规律的影响。我们提出了一般条件 U 统计量的非渐近边界,并对其渐近进行了研究。假设感兴趣的条件函数有一个参数模型,我们提出了一个基于条件 U 统计量的回归型估计器。首先在非渐近框架下,然后在两种不同的渐近状态下,得出了它的理论特性。我们还列举了一些例子来说明我们的方法。
{"title":"Estimation of a regular conditional functional by conditional U‐statistic regression","authors":"Alexis Derumigny","doi":"10.1111/stan.12350","DOIUrl":"https://doi.org/10.1111/stan.12350","url":null,"abstract":"U‐statistics constitute a large class of estimators, generalizing the empirical mean of a random variable to sums over every ‐tuple of distinct observations of . They may be used to estimate a regular functional of the law of . When a vector of covariates is available, a conditional U‐statistic describes the effect of on the conditional law of given , by estimating a regular conditional functional . We state nonasymptotic bounds of general conditional U‐statistics and study their asymptotics too. Assuming a parametric model of the conditional functional of interest, we propose a regression‐type estimator based on conditional U‐statistics. Its theoretical properties are derived, first in a nonasymptotic framework and then in two different asymptotic regimes. Some examples are given to illustrate our methods.","PeriodicalId":51178,"journal":{"name":"Statistica Neerlandica","volume":null,"pages":null},"PeriodicalIF":1.4,"publicationDate":"2024-07-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141645220","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Presmoothing was initially introduced in the linear regression setting as a method to improve finite sample efficiency by replacing the response variable with a nonparametric estimate of the regression function. Since then, it has found success in various domains, including survival analysis. However, the use of presmoothing with multiple continuous covariates is challenging and undesirable in practice. Inspired by the cure regression setup, we derive a simple estimator for (semi)parametric models with many regressors based on 1‐dimensional presmoothing. The method is particularly valuable when the response variable is not directly observed. However, even when the response is available, presmoothing can enhance accuracy for small to moderate sample sizes. We present several applications of the proposed method in different settings and investigate its finite sample behavior through simulations.
{"title":"Regression estimation using surrogate responses obtained by presmoothing","authors":"Eni Musta, Valentin Patilea, Ingrid Van Keilegom","doi":"10.1111/stan.12351","DOIUrl":"https://doi.org/10.1111/stan.12351","url":null,"abstract":"Presmoothing was initially introduced in the linear regression setting as a method to improve finite sample efficiency by replacing the response variable with a nonparametric estimate of the regression function. Since then, it has found success in various domains, including survival analysis. However, the use of presmoothing with multiple continuous covariates is challenging and undesirable in practice. Inspired by the cure regression setup, we derive a simple estimator for (semi)parametric models with many regressors based on 1‐dimensional presmoothing. The method is particularly valuable when the response variable is not directly observed. However, even when the response is available, presmoothing can enhance accuracy for small to moderate sample sizes. We present several applications of the proposed method in different settings and investigate its finite sample behavior through simulations.","PeriodicalId":51178,"journal":{"name":"Statistica Neerlandica","volume":null,"pages":null},"PeriodicalIF":1.5,"publicationDate":"2024-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141612644","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The studied semi‐continuous time series contains a nonnegligible portion of observations equal to a single value (typically zero), whereas the remaining outcomes are strictly positive. A novel class of hurdle GARCH models having dependent zero occurrences is considered and the classical maximum likelihood estimation is employed. However, a distribution of the underlying time series innovations does not belong into the exponential family, which together with the dependence of innovations makes the whole inference nonstandard. Consistency and asymptotic normality of the estimator are derived. Efficiency of the estimation is elaborated and compared with the alternative quasi‐likelihood approach. A bootstrap prediction is also discussed. An analysis of sparse nonlife insurance claims is performed.
{"title":"Hurdle GARCH models for nonnegative time series","authors":"Šárka Hudecová, Michal Pešta","doi":"10.1111/stan.12349","DOIUrl":"https://doi.org/10.1111/stan.12349","url":null,"abstract":"The studied semi‐continuous time series contains a nonnegligible portion of observations equal to a single value (typically zero), whereas the remaining outcomes are strictly positive. A novel class of hurdle GARCH models having dependent zero occurrences is considered and the classical maximum likelihood estimation is employed. However, a distribution of the underlying time series innovations does not belong into the exponential family, which together with the dependence of innovations makes the whole inference nonstandard. Consistency and asymptotic normality of the estimator are derived. Efficiency of the estimation is elaborated and compared with the alternative quasi‐likelihood approach. A bootstrap prediction is also discussed. An analysis of sparse nonlife insurance claims is performed.","PeriodicalId":51178,"journal":{"name":"Statistica Neerlandica","volume":null,"pages":null},"PeriodicalIF":1.5,"publicationDate":"2024-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141614987","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A model based on the cluster process representation of the self‐exciting process model is derived to allow for variation in the excitation effects for terrorist events in a self‐exciting or cluster process model. The model's derivation and implementation details are given and applied to data from the Global Terrorism Database (National Consortium for the Study of Terrorism and Responses to Terrorism (START), 2015) from 2000 to 2013. Results regarding the practical interpretation and implications for a theoretical model paralleling existing criminological theory are discussed.
{"title":"Endogenous and exogenous effects in self‐exciting process models of terrorist activity","authors":"Fabrizio Ruggeri, Michael D. Porter, Gentry White","doi":"10.1111/stan.12347","DOIUrl":"https://doi.org/10.1111/stan.12347","url":null,"abstract":"A model based on the cluster process representation of the self‐exciting process model is derived to allow for variation in the excitation effects for terrorist events in a self‐exciting or cluster process model. The model's derivation and implementation details are given and applied to data from the Global Terrorism Database (National Consortium for the Study of Terrorism and Responses to Terrorism (START), 2015) from 2000 to 2013. Results regarding the practical interpretation and implications for a theoretical model paralleling existing criminological theory are discussed.","PeriodicalId":51178,"journal":{"name":"Statistica Neerlandica","volume":null,"pages":null},"PeriodicalIF":1.5,"publicationDate":"2024-07-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141570743","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Michael T. Gorczyca, Tavish M. McDonald, Justice D. Sefas
In this note, we study how parameter vector estimation for a trigonometric regression model and the expected squared residual error computed from an estimated model are affected by Berkson‐type measurement error. Closed‐form expressions for the parameter vector and the expected squared residual error are obtained by assuming that the observed covariate data are sampled from an equispaced design and that measurement error is generated from a symmetric probability distribution with a mean of zero. Notably, these results indicate that estimates of the amplitude parameters for a trigonometric regression model suffer from attenuation bias when covariate data are mis‐measured, and that estimates of the phase‐shift parameters are unbiased.
{"title":"A note on trigonometric regression in the presence of Berkson‐type measurement error","authors":"Michael T. Gorczyca, Tavish M. McDonald, Justice D. Sefas","doi":"10.1111/stan.12344","DOIUrl":"https://doi.org/10.1111/stan.12344","url":null,"abstract":"In this note, we study how parameter vector estimation for a trigonometric regression model and the expected squared residual error computed from an estimated model are affected by Berkson‐type measurement error. Closed‐form expressions for the parameter vector and the expected squared residual error are obtained by assuming that the observed covariate data are sampled from an equispaced design and that measurement error is generated from a symmetric probability distribution with a mean of zero. Notably, these results indicate that estimates of the amplitude parameters for a trigonometric regression model suffer from attenuation bias when covariate data are mis‐measured, and that estimates of the phase‐shift parameters are unbiased.","PeriodicalId":51178,"journal":{"name":"Statistica Neerlandica","volume":null,"pages":null},"PeriodicalIF":1.5,"publicationDate":"2024-07-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141570880","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Michał G. Ciszewski, Jakob Söhl, Ton Leenen, Bart van Trigt, Geurt Jongbloed
Often the question arises whether can be predicted based on using a certain model. Especially for highly flexible models such as neural networks one may ask whether a seemingly good prediction is actually better than fitting pure noise or whether it has to be attributed to the flexibility of the model. This paper proposes a rigorous permutation test to assess whether the prediction is better than the prediction of pure noise. The test avoids any sample splitting and is based instead on generating new pairings of . It introduces a new formulation of the null hypothesis and rigorous justification for the test, which distinguishes it from the previous literature. The theoretical findings are applied both to simulated data and to sensor data of tennis serves in an experimental context. The simulation study underscores how the available information affects the test. It shows that the less informative the predictors, the lower the probability of rejecting the null hypothesis of fitting pure noise and emphasizes that detecting weaker dependence between variables requires a sufficient sample size.
{"title":"Testing for no effect in regression problems: A permutation approach","authors":"Michał G. Ciszewski, Jakob Söhl, Ton Leenen, Bart van Trigt, Geurt Jongbloed","doi":"10.1111/stan.12346","DOIUrl":"https://doi.org/10.1111/stan.12346","url":null,"abstract":"Often the question arises whether can be predicted based on using a certain model. Especially for highly flexible models such as neural networks one may ask whether a seemingly good prediction is actually better than fitting pure noise or whether it has to be attributed to the flexibility of the model. This paper proposes a rigorous permutation test to assess whether the prediction is better than the prediction of pure noise. The test avoids any sample splitting and is based instead on generating new pairings of . It introduces a new formulation of the null hypothesis and rigorous justification for the test, which distinguishes it from the previous literature. The theoretical findings are applied both to simulated data and to sensor data of tennis serves in an experimental context. The simulation study underscores how the available information affects the test. It shows that the less informative the predictors, the lower the probability of rejecting the null hypothesis of fitting pure noise and emphasizes that detecting weaker dependence between variables requires a sufficient sample size.","PeriodicalId":51178,"journal":{"name":"Statistica Neerlandica","volume":null,"pages":null},"PeriodicalIF":1.5,"publicationDate":"2024-06-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141503013","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We consider the estimation and inference of Average Causal Effects (ACE) when confounders are missing not at random. The identification has been discussed in literature; however, limited effort has been devoted into developing feasible nonparametric inference methods. The primary challenge arises from the estimation process of the missingness mechanism, an ill‐posed problem that poses obstacles in establishing asymptotic theory. This paper contributes to filling this gap in the following ways. Firstly, we introduce a weak pseudo‐metric to guarantee a faster convergence rate of the missingness mechanism estimator. Secondly, we employ a representer to derive the explicit expression of the influence function. We also propose a practical and stable approach to estimate the variance and construct the confidence interval. We verify our theoretical results in the simulation studies.
{"title":"Nonparametric causal inference with confounders missing not at random","authors":"Jiawei Shan, Xinyu Yan","doi":"10.1111/stan.12343","DOIUrl":"https://doi.org/10.1111/stan.12343","url":null,"abstract":"We consider the estimation and inference of Average Causal Effects (ACE) when confounders are missing not at random. The identification has been discussed in literature; however, limited effort has been devoted into developing feasible nonparametric inference methods. The primary challenge arises from the estimation process of the missingness mechanism, an ill‐posed problem that poses obstacles in establishing asymptotic theory. This paper contributes to filling this gap in the following ways. Firstly, we introduce a weak pseudo‐metric to guarantee a faster convergence rate of the missingness mechanism estimator. Secondly, we employ a representer to derive the explicit expression of the influence function. We also propose a practical and stable approach to estimate the variance and construct the confidence interval. We verify our theoretical results in the simulation studies.","PeriodicalId":51178,"journal":{"name":"Statistica Neerlandica","volume":null,"pages":null},"PeriodicalIF":1.5,"publicationDate":"2024-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141365642","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The minorization–maximization (MM) algorithm is an optimization technique for iteratively calculating the maximizer of a concave target function rather than a root–finding tool. In this paper, we in the first time develop the MM algorithm as a new method for seeking the root of a univariate nonlinear equation . The key idea is to transfer the root–finding issue to iteratively calculate the maximizer of a concave target function by designing a new MM algorithm. According to the ascent property of the MM algorithm, we know that the proposed algorithm converges to the root and does not depend on any initial values, in contrast to Newton's method. Several statistical examples are provided to demonstrate the proposed algorithm.
最小化-最大化(MM)算法是一种迭代计算凹目标函数最大值的优化技术,而非寻根工具。在本文中,我们首次开发了 MM 算法,作为寻求单变量非线性方程根的一种新方法。其主要思想是通过设计一种新的 MM 算法,将寻根问题转移到迭代计算凹目标函数的最大值上。根据 MM 算法的上升特性,我们知道与牛顿方法相比,所提出的算法收敛于根,并且不依赖于任何初始值。我们提供了几个统计实例来演示所提出的算法。
{"title":"A new MM algorithm for root‐finding problems","authors":"Xunjian Li, Shuang Li, Guo‐Liang Tian","doi":"10.1111/stan.12345","DOIUrl":"https://doi.org/10.1111/stan.12345","url":null,"abstract":"The minorization–maximization (MM) algorithm is an optimization technique for iteratively calculating the maximizer of a concave target function rather than a root–finding tool. In this paper, we in the first time develop the MM algorithm as a new method for seeking the root of a univariate nonlinear equation . The key idea is to transfer the root–finding issue to iteratively calculate the maximizer of a concave target function by designing a new MM algorithm. According to the ascent property of the MM algorithm, we know that the proposed algorithm converges to the root and does not depend on any initial values, in contrast to Newton's method. Several statistical examples are provided to demonstrate the proposed algorithm.","PeriodicalId":51178,"journal":{"name":"Statistica Neerlandica","volume":null,"pages":null},"PeriodicalIF":1.5,"publicationDate":"2024-06-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141269360","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this study, we address the problem of high‐dimensional binary classification. Our proposed solution involves employing an aggregation technique founded on exponential weights and empirical hinge loss. Through the employment of a suitable sparsity‐inducing prior distribution, we demonstrate that our method yields favorable theoretical results on prediction error. The efficiency of our procedure is achieved through the utilization of Langevin Monte Carlo, a gradient‐based sampling approach. To illustrate the effectiveness of our approach, we conduct comparisons with the logistic Lasso on simulated data and a real dataset. Our method frequently demonstrates superior performance compared to the logistic Lasso.
在这项研究中,我们解决了高维二元分类的问题。我们提出的解决方案包括采用一种基于指数权重和经验铰链损失的聚合技术。通过使用合适的稀疏性诱导先验分布,我们证明了我们的方法在预测误差方面产生了良好的理论结果。通过使用基于梯度的抽样方法 Langevin Monte Carlo,我们实现了程序的高效性。为了说明我们方法的有效性,我们在模拟数据和真实数据集上与 logistic Lasso 进行了比较。与 logistic Lasso 相比,我们的方法经常表现出更优越的性能。
{"title":"High‐dimensional sparse classification using exponential weighting with empirical hinge loss","authors":"The Tien Mai","doi":"10.1111/stan.12342","DOIUrl":"https://doi.org/10.1111/stan.12342","url":null,"abstract":"In this study, we address the problem of high‐dimensional binary classification. Our proposed solution involves employing an aggregation technique founded on exponential weights and empirical hinge loss. Through the employment of a suitable sparsity‐inducing prior distribution, we demonstrate that our method yields favorable theoretical results on prediction error. The efficiency of our procedure is achieved through the utilization of Langevin Monte Carlo, a gradient‐based sampling approach. To illustrate the effectiveness of our approach, we conduct comparisons with the logistic Lasso on simulated data and a real dataset. Our method frequently demonstrates superior performance compared to the logistic Lasso.","PeriodicalId":51178,"journal":{"name":"Statistica Neerlandica","volume":null,"pages":null},"PeriodicalIF":1.5,"publicationDate":"2024-05-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141150754","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}