首页 > 最新文献

Electronic Journal of Statistics最新文献

英文 中文
Structure learning via unstructured kernel-based M-estimation 基于非结构化核的m估计的结构学习
4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2023-01-01 DOI: 10.1214/23-ejs2153
Xin He, Yeheng Ge, Xingdong Feng
In statistical learning, identifying underlying structures of true target functions based on observed data plays a crucial role to facilitate subsequent modeling and analysis. Unlike most of those existing methods that focus on some specific settings under certain model assumptions, a general and novel framework is proposed for recovering the true structures of target functions by using unstructured M-estimation in a reproducing kernel Hilbert space (RKHS) in this paper. This framework is inspired by the fact that gradient functions can be employed as a valid tool to learn underlying structures, including sparse learning, interaction selection and model identification, and it is easy to implement by taking advantage of some nice properties of the RKHS. More importantly, it admits a wide range of loss functions, and thus includes many commonly used methods as special cases, such as mean regression, quantile regression, likelihood-based classification, and margin-based classification, which is also computationally efficient by solving convex optimization tasks. The asymptotic results of the proposed framework are established within a rich family of loss functions without any explicit model specifications. The superior performance of the proposed framework is also demonstrated by a variety of simulated examples and a real case study.
在统计学习中,基于观测数据识别真实目标函数的底层结构对于后续建模和分析至关重要。与现有的大多数方法不同,本文提出了一种利用再现核希尔伯特空间(RKHS)中的非结构化m估计来恢复目标函数真实结构的通用框架。该框架的灵感来自于梯度函数可以作为一种有效的工具来学习底层结构,包括稀疏学习、交互选择和模型识别,并且通过利用RKHS的一些很好的特性很容易实现。更重要的是,它允许广泛的损失函数,因此包括许多常用的方法作为特殊情况,如均值回归、分位数回归、基于似然的分类、基于边缘的分类,这些方法通过求解凸优化任务也具有计算效率。该框架的渐近结果建立在一个丰富的损失函数族中,不需要任何显式的模型规范。通过各种仿真实例和实际案例研究,证明了该框架的优越性能。
{"title":"Structure learning via unstructured kernel-based M-estimation","authors":"Xin He, Yeheng Ge, Xingdong Feng","doi":"10.1214/23-ejs2153","DOIUrl":"https://doi.org/10.1214/23-ejs2153","url":null,"abstract":"In statistical learning, identifying underlying structures of true target functions based on observed data plays a crucial role to facilitate subsequent modeling and analysis. Unlike most of those existing methods that focus on some specific settings under certain model assumptions, a general and novel framework is proposed for recovering the true structures of target functions by using unstructured M-estimation in a reproducing kernel Hilbert space (RKHS) in this paper. This framework is inspired by the fact that gradient functions can be employed as a valid tool to learn underlying structures, including sparse learning, interaction selection and model identification, and it is easy to implement by taking advantage of some nice properties of the RKHS. More importantly, it admits a wide range of loss functions, and thus includes many commonly used methods as special cases, such as mean regression, quantile regression, likelihood-based classification, and margin-based classification, which is also computationally efficient by solving convex optimization tasks. The asymptotic results of the proposed framework are established within a rich family of loss functions without any explicit model specifications. The superior performance of the proposed framework is also demonstrated by a variety of simulated examples and a real case study.","PeriodicalId":49272,"journal":{"name":"Electronic Journal of Statistics","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135958438","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Regression in tensor product spaces by the method of sieves.
IF 1 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2023-01-01 Epub Date: 2023-12-07 DOI: 10.1214/23-ejs2188
Tianyu Zhang, Noah Simon

Estimation of a conditional mean (linking a set of features to an outcome of interest) is a fundamental statistical task. While there is an appeal to flexible nonparametric procedures, effective estimation in many classical nonparametric function spaces, e.g., multivariate Sobolev spaces, can be prohibitively difficult - both statistically and computationally - especially when the number of features is large. In this paper, we present some sieve estimators for regression in multivariate product spaces. We take Sobolev-type smoothness spaces as an example, though our general framework can be applied to many reproducing kernel Hilbert spaces. These spaces are more amenable to multivariate regression, and allow us to, inpart, avoid the curse of dimensionality. Our estimator can be easily applied to multivariate nonparametric problems and has appealing statistical and computational properties. Moreover, it can effectively leverage additional structure such as feature sparsity.

{"title":"Regression in tensor product spaces by the method of sieves.","authors":"Tianyu Zhang, Noah Simon","doi":"10.1214/23-ejs2188","DOIUrl":"10.1214/23-ejs2188","url":null,"abstract":"<p><p>Estimation of a conditional mean (linking a set of features to an outcome of interest) is a fundamental statistical task. While there is an appeal to flexible nonparametric procedures, effective estimation in many classical nonparametric function spaces, e.g., multivariate Sobolev spaces, can be prohibitively difficult - both statistically and computationally - especially when the number of features is large. In this paper, we present some sieve estimators for regression in multivariate product spaces. We take Sobolev-type smoothness spaces as an example, though our general framework can be applied to many reproducing kernel Hilbert spaces. These spaces are more amenable to multivariate regression, and allow us to, inpart, avoid the curse of dimensionality. Our estimator can be easily applied to multivariate nonparametric problems and has appealing statistical and computational properties. Moreover, it can effectively leverage additional structure such as feature sparsity.</p>","PeriodicalId":49272,"journal":{"name":"Electronic Journal of Statistics","volume":"17 2","pages":"3660-3727"},"PeriodicalIF":1.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11784939/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143081816","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Estimation of the Hurst parameter from continuous noisy data 连续噪声数据中Hurst参数的估计
4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2023-01-01 DOI: 10.1214/23-ejs2156
Pavel Chigansky, Marina Kleptsyna
This paper addresses the problem of estimating the Hurst exponent of the fractional Brownian motion from continuous time noisy sample. When the Hurst parameter is greater than 3∕4, consistent estimation is possible only if either the length of the observation interval increases to infinity or intensity of the noise decreases to zero. The main result is a proof of the Local Asymptotic Normality (LAN) of the model in these two regimes which reveals the optimal minimax estimation rates.
研究了从连续时间噪声样本中估计分数阶布朗运动的赫斯特指数的问题。当赫斯特参数大于3∕4时,只有当观测区间的长度增大到无穷大或噪声强度减小到零时,才有可能进行一致估计。主要结果是证明了模型在这两种情况下的局部渐近正态性,揭示了最优的极大极小估计率。
{"title":"Estimation of the Hurst parameter from continuous noisy data","authors":"Pavel Chigansky, Marina Kleptsyna","doi":"10.1214/23-ejs2156","DOIUrl":"https://doi.org/10.1214/23-ejs2156","url":null,"abstract":"This paper addresses the problem of estimating the Hurst exponent of the fractional Brownian motion from continuous time noisy sample. When the Hurst parameter is greater than 3∕4, consistent estimation is possible only if either the length of the observation interval increases to infinity or intensity of the noise decreases to zero. The main result is a proof of the Local Asymptotic Normality (LAN) of the model in these two regimes which reveals the optimal minimax estimation rates.","PeriodicalId":49272,"journal":{"name":"Electronic Journal of Statistics","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135954992","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Improving estimation efficiency for two-phase, outcome-dependent sampling studies 提高两阶段、结果相关抽样研究的估计效率
IF 1.1 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2022-12-19 DOI: 10.1214/23-ejs2124
Menglu Che, Peisong Han, J. Lawless
Two-phase outcome dependent sampling (ODS) is widely used in many fields, especially when certain covariates are expensive and/or difficult to measure. For two-phase ODS, the conditional maximum likelihood (CML) method is very attractive because it can handle zero Phase 2 selection probabilities and avoids modeling the covariate distribution. However, most existing CML-based methods use only the Phase 2 sample and thus may be less efficient than other methods. We propose a general empirical likelihood method that uses CML augmented with additional information in the whole Phase 1 sample to improve estimation efficiency. The proposed method maintains the ability to handle zero selection probabilities and avoids modeling the covariate distribution, but can lead to substantial efficiency gains over CML in the inexpensive covariates, or in the influential covariate when a surrogate is available, because of an effective use of the Phase 1 data. Simulations and a real data illustration using NHANES data are presented.
两阶段结果相关采样(ODS)在许多领域被广泛使用,尤其是当某些协变量昂贵和/或难以测量时。对于两相ODS,条件最大似然(CML)方法非常有吸引力,因为它可以处理零的第二阶段选择概率,并避免对协变量分布进行建模。然而,大多数现有的基于CML的方法仅使用阶段2样本,因此可能不如其他方法有效。我们提出了一种通用的经验似然方法,该方法使用在整个阶段1样本中增加额外信息的CML来提高估计效率。所提出的方法保持了处理零选择概率的能力,并避免了对协变量分布进行建模,但由于有效地使用了第1阶段数据,在廉价的协变量中,或在有替代项的情况下,在有影响的协变量上,可以显著提高CML的效率。给出了使用NHANES数据的模拟和实际数据说明。
{"title":"Improving estimation efficiency for two-phase, outcome-dependent sampling studies","authors":"Menglu Che, Peisong Han, J. Lawless","doi":"10.1214/23-ejs2124","DOIUrl":"https://doi.org/10.1214/23-ejs2124","url":null,"abstract":"Two-phase outcome dependent sampling (ODS) is widely used in many fields, especially when certain covariates are expensive and/or difficult to measure. For two-phase ODS, the conditional maximum likelihood (CML) method is very attractive because it can handle zero Phase 2 selection probabilities and avoids modeling the covariate distribution. However, most existing CML-based methods use only the Phase 2 sample and thus may be less efficient than other methods. We propose a general empirical likelihood method that uses CML augmented with additional information in the whole Phase 1 sample to improve estimation efficiency. The proposed method maintains the ability to handle zero selection probabilities and avoids modeling the covariate distribution, but can lead to substantial efficiency gains over CML in the inexpensive covariates, or in the influential covariate when a surrogate is available, because of an effective use of the Phase 1 data. Simulations and a real data illustration using NHANES data are presented.","PeriodicalId":49272,"journal":{"name":"Electronic Journal of Statistics","volume":" ","pages":""},"PeriodicalIF":1.1,"publicationDate":"2022-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45404637","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Posterior contraction and testing for multivariate isotonic regression 后缩和多元等张回归的检验
IF 1.1 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2022-11-22 DOI: 10.1214/23-ejs2115
Kang-Kang Wang, S. Ghosal
We consider the nonparametric regression problem with multiple predictors and an additive error, where the regression function is assumed to be coordinatewise nondecreasing. We propose a Bayesian approach to make an inference on the multivariate monotone regression function, obtain the posterior contraction rate, and construct a universally consistent Bayesian testing procedure for multivariate monotonicity. To facilitate posterior analysis, we set aside the shape restrictions temporarily, and endow a prior on blockwise constant regression functions with heights independently normally distributed. The unknown variance of the error term is either estimated by the marginal maximum likelihood estimate or is equipped with an inverse-gamma prior. Then the unrestricted block heights are a posteriori also independently normally distributed given the error variance, by conjugacy. To comply with the shape restrictions, we project samples from the unrestricted posterior onto the class of multivariate monotone functions, inducing the"projection-posterior distribution", to be used for making an inference. Under an $mathbb{L}_1$-metric, we show that the projection-posterior based on $n$ independent samples contracts around the true monotone regression function at the optimal rate $n^{-1/(2+d)}$. Then we construct a Bayesian test for multivariate monotonicity based on the posterior probability of a shrinking neighborhood of the class of multivariate monotone functions. We show that the test is universally consistent, that is, the level of the Bayesian test goes to zero, and the power at any fixed alternative goes to one. Moreover, we show that for a smooth alternative function, power goes to one as long as its distance to the class of multivariate monotone functions is at least of the order of the estimation error for a smooth function.
我们考虑了具有多个预测因子和一个加性误差的非参数回归问题,其中假设回归函数是协调不递减的。我们提出了一种贝叶斯方法来推断多元单调回归函数,获得后验收缩率,并构造了多元单调性的普遍一致贝叶斯检验程序。为了便于后验分析,我们暂时搁置了形状限制,并赋予逐块常数回归函数的先验高度独立正态分布。误差项的未知方差要么通过边际最大似然估计来估计,要么配备有逆伽马先验。然后,在给定误差方差的情况下,通过共轭,不受限制的块高度是独立正态分布的后验。为了遵守形状限制,我们将来自非限制后验的样本投影到一类多元单调函数上,从而导出“投影后验分布”,用于进行推理。在$mathbb下{L}_1$-度量,我们证明了基于$n$独立样本的投影后验在最优速率$n^{-1/(2+d)}$下围绕真单调回归函数收缩。然后,基于一类多元单调函数收缩邻域的后验概率,构造了多元单调性的贝叶斯检验。我们证明了该测试是普遍一致的,也就是说,贝叶斯测试的水平为零,任何固定备选方案的功率为一。此外,我们证明了对于光滑的替代函数,只要它到多元单调函数类的距离至少是光滑函数的估计误差的阶数,幂就等于1。
{"title":"Posterior contraction and testing for multivariate isotonic regression","authors":"Kang-Kang Wang, S. Ghosal","doi":"10.1214/23-ejs2115","DOIUrl":"https://doi.org/10.1214/23-ejs2115","url":null,"abstract":"We consider the nonparametric regression problem with multiple predictors and an additive error, where the regression function is assumed to be coordinatewise nondecreasing. We propose a Bayesian approach to make an inference on the multivariate monotone regression function, obtain the posterior contraction rate, and construct a universally consistent Bayesian testing procedure for multivariate monotonicity. To facilitate posterior analysis, we set aside the shape restrictions temporarily, and endow a prior on blockwise constant regression functions with heights independently normally distributed. The unknown variance of the error term is either estimated by the marginal maximum likelihood estimate or is equipped with an inverse-gamma prior. Then the unrestricted block heights are a posteriori also independently normally distributed given the error variance, by conjugacy. To comply with the shape restrictions, we project samples from the unrestricted posterior onto the class of multivariate monotone functions, inducing the\"projection-posterior distribution\", to be used for making an inference. Under an $mathbb{L}_1$-metric, we show that the projection-posterior based on $n$ independent samples contracts around the true monotone regression function at the optimal rate $n^{-1/(2+d)}$. Then we construct a Bayesian test for multivariate monotonicity based on the posterior probability of a shrinking neighborhood of the class of multivariate monotone functions. We show that the test is universally consistent, that is, the level of the Bayesian test goes to zero, and the power at any fixed alternative goes to one. Moreover, we show that for a smooth alternative function, power goes to one as long as its distance to the class of multivariate monotone functions is at least of the order of the estimation error for a smooth function.","PeriodicalId":49272,"journal":{"name":"Electronic Journal of Statistics","volume":" ","pages":""},"PeriodicalIF":1.1,"publicationDate":"2022-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48939928","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
A bootstrap method for spectral statistics in high-dimensional elliptical models 高维椭圆模型光谱统计的自举方法
IF 1.1 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2022-09-08 DOI: 10.1214/23-ejs2140
Si-Ying Wang, Miles E. Lopes
Although there is an extensive literature on the eigenvalues of high-dimensional sample covariance matrices, much of it is specialized to independent components (IC) models -- in which observations are represented as linear transformations of random vectors with independent entries. By contrast, less is known in the context of elliptical models, which violate the independence structure of IC models and exhibit quite different statistical phenomena. In particular, very little is known about the scope of bootstrap methods for doing inference with spectral statistics in high-dimensional elliptical models. To fill this gap, we show how a bootstrap approach developed previously for IC models can be extended to handle the different properties of elliptical models. Within this setting, our main theoretical result guarantees that the proposed method consistently approximates the distributions of linear spectral statistics, which play a fundamental role in multivariate analysis. We also provide empirical results showing that the proposed method performs well for a variety of nonlinear spectral statistics.
尽管有大量关于高维样本协方差矩阵特征值的文献,但其中大部分都专门用于独立分量(IC)模型,其中观测值表示为具有独立项的随机向量的线性变换。相比之下,在椭圆模型的背景下,人们所知甚少,椭圆模型违反了IC模型的独立性结构,并表现出截然不同的统计现象。特别是,对于在高维椭圆模型中使用谱统计进行推断的bootstrap方法的范围知之甚少。为了填补这一空白,我们展示了如何将以前为IC模型开发的引导方法扩展到处理椭圆模型的不同性质。在这种情况下,我们的主要理论结果保证了所提出的方法始终近似于线性谱统计的分布,线性谱统计在多元分析中起着重要作用。我们还提供了经验结果,表明所提出的方法在各种非线性谱统计中表现良好。
{"title":"A bootstrap method for spectral statistics in high-dimensional elliptical models","authors":"Si-Ying Wang, Miles E. Lopes","doi":"10.1214/23-ejs2140","DOIUrl":"https://doi.org/10.1214/23-ejs2140","url":null,"abstract":"Although there is an extensive literature on the eigenvalues of high-dimensional sample covariance matrices, much of it is specialized to independent components (IC) models -- in which observations are represented as linear transformations of random vectors with independent entries. By contrast, less is known in the context of elliptical models, which violate the independence structure of IC models and exhibit quite different statistical phenomena. In particular, very little is known about the scope of bootstrap methods for doing inference with spectral statistics in high-dimensional elliptical models. To fill this gap, we show how a bootstrap approach developed previously for IC models can be extended to handle the different properties of elliptical models. Within this setting, our main theoretical result guarantees that the proposed method consistently approximates the distributions of linear spectral statistics, which play a fundamental role in multivariate analysis. We also provide empirical results showing that the proposed method performs well for a variety of nonlinear spectral statistics.","PeriodicalId":49272,"journal":{"name":"Electronic Journal of Statistics","volume":" ","pages":""},"PeriodicalIF":1.1,"publicationDate":"2022-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42952281","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Intuitive joint priors for Bayesian linear multilevel models: The R2D2M2 prior 贝叶斯线性多级模型的直观联合先验:R2D2M2先验
IF 1.1 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2022-08-15 DOI: 10.1214/23-ejs2136
Javier Enrique Aguilar, Paul-Christian Burkner
The training of high-dimensional regression models on comparably sparse data is an important yet complicated topic, especially when there are many more model parameters than observations in the data. From a Bayesian perspective, inference in such cases can be achieved with the help of shrinkage prior distributions, at least for generalized linear models. However, real-world data usually possess multilevel structures, such as repeated measurements or natural groupings of individuals, which existing shrinkage priors are not built to deal with. We generalize and extend one of these priors, the R2D2 prior by Zhang et al. (2020), to linear multilevel models leading to what we call the R2D2M2 prior. The proposed prior enables both local and global shrinkage of the model parameters. It comes with interpretable hyperparameters, which we show to be intrinsically related to vital properties of the prior, such as rates of concentration around the origin, tail behavior, and amount of shrinkage the prior exerts. We offer guidelines on how to select the prior's hyperparameters by deriving shrinkage factors and measuring the effective number of non-zero model coefficients. Hence, the user can readily evaluate and interpret the amount of shrinkage implied by a specific choice of hyperparameters. Finally, we perform extensive experiments on simulated and real data, showing that our inference procedure for the prior is well calibrated, has desirable global and local regularization properties and enables the reliable and interpretable estimation of much more complex Bayesian multilevel models than was previously possible.
在相对稀疏的数据上训练高维回归模型是一个重要但复杂的主题,尤其是当数据中的模型参数比观测值多得多时。从贝叶斯的角度来看,在这种情况下,至少对于广义线性模型,可以在收缩先验分布的帮助下进行推理。然而,真实世界的数据通常具有多级结构,例如重复测量或个体的自然分组,而现有的收缩先验并不是为了处理这些结构而建立的。我们将其中一个先验,张等人的R2D2先验进行了推广和扩展。(2020),将其推广到线性多级模型,从而产生我们所说的R2D2M2先验。所提出的先验能够实现模型参数的局部和全局收缩。它带有可解释的超参数,我们发现这些超参数与先验的重要特性有着内在的联系,例如原点周围的集中率、尾部行为和先验施加的收缩量。我们提供了如何通过推导收缩因子和测量非零模型系数的有效数量来选择先验超参数的指南。因此,用户可以容易地评估和解释超参数的特定选择所暗示的收缩量。最后,我们在模拟和真实数据上进行了大量实验,表明我们对先验的推理过程经过了很好的校准,具有理想的全局和局部正则化特性,并能够对比以前可能的更复杂的贝叶斯多级模型进行可靠和可解释的估计。
{"title":"Intuitive joint priors for Bayesian linear multilevel models: The R2D2M2 prior","authors":"Javier Enrique Aguilar, Paul-Christian Burkner","doi":"10.1214/23-ejs2136","DOIUrl":"https://doi.org/10.1214/23-ejs2136","url":null,"abstract":"The training of high-dimensional regression models on comparably sparse data is an important yet complicated topic, especially when there are many more model parameters than observations in the data. From a Bayesian perspective, inference in such cases can be achieved with the help of shrinkage prior distributions, at least for generalized linear models. However, real-world data usually possess multilevel structures, such as repeated measurements or natural groupings of individuals, which existing shrinkage priors are not built to deal with. We generalize and extend one of these priors, the R2D2 prior by Zhang et al. (2020), to linear multilevel models leading to what we call the R2D2M2 prior. The proposed prior enables both local and global shrinkage of the model parameters. It comes with interpretable hyperparameters, which we show to be intrinsically related to vital properties of the prior, such as rates of concentration around the origin, tail behavior, and amount of shrinkage the prior exerts. We offer guidelines on how to select the prior's hyperparameters by deriving shrinkage factors and measuring the effective number of non-zero model coefficients. Hence, the user can readily evaluate and interpret the amount of shrinkage implied by a specific choice of hyperparameters. Finally, we perform extensive experiments on simulated and real data, showing that our inference procedure for the prior is well calibrated, has desirable global and local regularization properties and enables the reliable and interpretable estimation of much more complex Bayesian multilevel models than was previously possible.","PeriodicalId":49272,"journal":{"name":"Electronic Journal of Statistics","volume":" ","pages":""},"PeriodicalIF":1.1,"publicationDate":"2022-08-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44330955","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
Functional spherical autocorrelation: A robust estimate of the autocorrelation of a functional time series 函数球形自相关:对函数时间序列的自相关的稳健估计
IF 1.1 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2022-07-12 DOI: 10.1214/23-ejs2112
Chi-Kuang Yeh, Gregory Rice, J. Dubin
We propose a new autocorrelation measure for functional time series that we term spherical autocorrelation. It is based on measuring the average angle between lagged pairs of series after having been projected onto the unit sphere. This new measure enjoys several complimentary advantages compared to existing autocorrelation measures for functional data, since it both 1) describes a notion of sign or direction of serial dependence in the series, and 2) is more robust to outliers. The asymptotic properties of estimators of the spherical autocorrelation are established, and are used to construct confidence intervals and portmanteau white noise tests. These confidence intervals and tests are shown to be effective in simulation experiments, and demonstrated in applications to model selection for daily electricity price curves, and measuring the volatility in densely observed asset price data.
我们提出了一种新的函数时间序列的自相关测度,称为球面自相关。它是基于测量投影到单位球面上后滞后的级数对之间的平均角度。与现有的函数数据自相关测量相比,这种新的测量具有几个互补的优势,因为它既1)描述了序列中序列相关性的符号或方向的概念,又2)对异常值更具鲁棒性。建立了球面自相关估计量的渐近性质,并用于构造置信区间和组合白噪声检验。这些置信区间和测试在模拟实验中被证明是有效的,并在日常电价曲线的模型选择和测量密集观察的资产价格数据的波动性的应用中得到了证明。
{"title":"Functional spherical autocorrelation: A robust estimate of the autocorrelation of a functional time series","authors":"Chi-Kuang Yeh, Gregory Rice, J. Dubin","doi":"10.1214/23-ejs2112","DOIUrl":"https://doi.org/10.1214/23-ejs2112","url":null,"abstract":"We propose a new autocorrelation measure for functional time series that we term spherical autocorrelation. It is based on measuring the average angle between lagged pairs of series after having been projected onto the unit sphere. This new measure enjoys several complimentary advantages compared to existing autocorrelation measures for functional data, since it both 1) describes a notion of sign or direction of serial dependence in the series, and 2) is more robust to outliers. The asymptotic properties of estimators of the spherical autocorrelation are established, and are used to construct confidence intervals and portmanteau white noise tests. These confidence intervals and tests are shown to be effective in simulation experiments, and demonstrated in applications to model selection for daily electricity price curves, and measuring the volatility in densely observed asset price data.","PeriodicalId":49272,"journal":{"name":"Electronic Journal of Statistics","volume":" ","pages":""},"PeriodicalIF":1.1,"publicationDate":"2022-07-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42080559","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Training-conditional coverage for distribution-free predictive inference 无分布预测推理的训练条件覆盖
IF 1.1 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2022-05-07 DOI: 10.1214/23-ejs2145
Michael Bian, R. Barber
The field of distribution-free predictive inference provides tools for provably valid prediction without any assumptions on the distribution of the data, which can be paired with any regression algorithm to provide accurate and reliable predictive intervals. The guarantees provided by these methods are typically marginal, meaning that predictive accuracy holds on average over both the training data set and the test point that is queried. However, it may be preferable to obtain a stronger guarantee of training-conditional coverage, which would ensure that most draws of the training data set result in accurate predictive accuracy on future test points. This property is known to hold for the split conformal prediction method. In this work, we examine the training-conditional coverage properties of several other distribution-free predictive inference methods, and find that training-conditional coverage is achieved by some methods but is impossible to guarantee without further assumptions for others.
无分布预测推理领域提供了用于可证明有效预测的工具,而无需对数据的分布进行任何假设,可以与任何回归算法配对,以提供准确可靠的预测区间。这些方法提供的保证通常是边际的,这意味着预测准确性在训练数据集和被查询的测试点上平均保持不变。然而,可能更可取的是获得训练条件覆盖的更强保证,这将确保训练数据集的大多数提取导致对未来测试点的准确预测准确性。已知这种性质适用于分裂共形预测方法。在这项工作中,我们检验了其他几种无分布预测推理方法的训练条件覆盖特性,发现训练条件覆盖是通过一些方法实现的,但如果没有对其他方法的进一步假设,就无法保证。
{"title":"Training-conditional coverage for distribution-free predictive inference","authors":"Michael Bian, R. Barber","doi":"10.1214/23-ejs2145","DOIUrl":"https://doi.org/10.1214/23-ejs2145","url":null,"abstract":"The field of distribution-free predictive inference provides tools for provably valid prediction without any assumptions on the distribution of the data, which can be paired with any regression algorithm to provide accurate and reliable predictive intervals. The guarantees provided by these methods are typically marginal, meaning that predictive accuracy holds on average over both the training data set and the test point that is queried. However, it may be preferable to obtain a stronger guarantee of training-conditional coverage, which would ensure that most draws of the training data set result in accurate predictive accuracy on future test points. This property is known to hold for the split conformal prediction method. In this work, we examine the training-conditional coverage properties of several other distribution-free predictive inference methods, and find that training-conditional coverage is achieved by some methods but is impossible to guarantee without further assumptions for others.","PeriodicalId":49272,"journal":{"name":"Electronic Journal of Statistics","volume":" ","pages":""},"PeriodicalIF":1.1,"publicationDate":"2022-05-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44104591","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
Tail inference using extreme U-statistics 使用极端u统计量的尾部推断
IF 1.1 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2022-03-16 DOI: 10.1214/23-ejs2129
Jochem Oorschot, J. Segers, Chen Zhou
Extreme U-statistics arise when the kernel of a U-statistic has a high degree but depends only on its arguments through a small number of top order statistics. As the kernel degree of the U-statistic grows to infinity with the sample size, estimators built out of such statistics form an intermediate family in between those constructed in the block maxima and peaks-over-threshold frameworks in extreme value analysis. The asymptotic normality of extreme U-statistics based on location-scale invariant kernels is established. Although the asymptotic variance coincides with the one of the H'ajek projection, the proof goes beyond considering the first term in Hoeffding's variance decomposition. We propose a kernel depending on the three highest order statistics leading to a location-scale invariant estimator of the extreme value index resembling the Pickands estimator. This extreme Pickands U-estimator is asymptotically normal and its finite-sample performance is competitive with that of the pseudo-maximum likelihood estimator.
当U-统计量的核具有很高的度,但仅通过少量的高阶统计量依赖于其自变量时,就会出现极端U-统计量。随着U-统计量的核度随着样本量的增加而增长到无穷大,由这种统计量构建的估计量在极值分析中的块最大值和峰值阈值框架中构建的估计之间形成了一个中间族。建立了基于位置尺度不变核的极限U-统计量的渐近正态性。尽管渐近方差与H’ajek投影的渐近方差一致,但证明超出了考虑Hoeffding方差分解中的第一项。我们提出了一个依赖于三个最高阶统计量的核,从而产生类似于Pickands估计器的极值指数的位置-尺度不变估计器。该极限Pickands U-估计是渐近正态的,其有限样本性能与伪最大似然估计具有竞争性。
{"title":"Tail inference using extreme U-statistics","authors":"Jochem Oorschot, J. Segers, Chen Zhou","doi":"10.1214/23-ejs2129","DOIUrl":"https://doi.org/10.1214/23-ejs2129","url":null,"abstract":"Extreme U-statistics arise when the kernel of a U-statistic has a high degree but depends only on its arguments through a small number of top order statistics. As the kernel degree of the U-statistic grows to infinity with the sample size, estimators built out of such statistics form an intermediate family in between those constructed in the block maxima and peaks-over-threshold frameworks in extreme value analysis. The asymptotic normality of extreme U-statistics based on location-scale invariant kernels is established. Although the asymptotic variance coincides with the one of the H'ajek projection, the proof goes beyond considering the first term in Hoeffding's variance decomposition. We propose a kernel depending on the three highest order statistics leading to a location-scale invariant estimator of the extreme value index resembling the Pickands estimator. This extreme Pickands U-estimator is asymptotically normal and its finite-sample performance is competitive with that of the pseudo-maximum likelihood estimator.","PeriodicalId":49272,"journal":{"name":"Electronic Journal of Statistics","volume":" ","pages":""},"PeriodicalIF":1.1,"publicationDate":"2022-03-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47501723","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
期刊
Electronic Journal of Statistics
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1