Pub Date : 2023-12-14DOI: 10.1007/s11749-023-00903-9
Gilles Crommen, Jad Beyhum, Ingrid Van Keilegom
This paper considers the problem of inferring the causal effect of a variable Z on a dependently censored survival time T. We allow for unobserved confounding variables, such that the error term of the regression model for T is dependent on the confounded variable Z. Moreover, T is subject to dependent censoring. This means that T is right censored by a censoring time C, which is dependent on T (even after conditioning out the effects of the measured covariates). A control function approach, relying on an instrumental variable, is leveraged to tackle the confounding issue. Further, it is assumed that T and C follow a joint regression model with bivariate Gaussian error terms and an unspecified covariance matrix, such that the dependent censoring can be handled in a flexible manner. Conditions under which the model is identifiable are given, a two-step estimation procedure is proposed, and it is shown that the resulting estimator is consistent and asymptotically normal. Simulations are used to confirm the validity and finite-sample performance of the estimation procedure. Finally, the proposed method is used to estimate the causal effect of job training programs on unemployment duration.
本文考虑的问题是推断变量 Z 对依赖性删减的生存时间 T 的因果效应。我们考虑了未观察到的混杂变量,因此 T 的回归模型误差项依赖于混杂变量 Z。这意味着 T 是由依赖于 T 的删减时间 C 右删减的(即使在剔除测量协变量的影响后)。利用工具变量的控制函数方法来解决混杂问题。此外,假设 T 和 C 遵循一个联合回归模型,该模型具有双变量高斯误差项和一个未指定的协方差矩阵,因此可以灵活地处理依赖性删减。给出了模型可识别的条件,提出了一个两步估计程序,并证明所得到的估计值是一致和渐近正态的。模拟证实了估计程序的有效性和有限样本性能。最后,利用所提出的方法估算了就业培训项目对失业持续时间的因果效应。
{"title":"An instrumental variable approach under dependent censoring","authors":"Gilles Crommen, Jad Beyhum, Ingrid Van Keilegom","doi":"10.1007/s11749-023-00903-9","DOIUrl":"https://doi.org/10.1007/s11749-023-00903-9","url":null,"abstract":"<p>This paper considers the problem of inferring the causal effect of a variable <i>Z</i> on a dependently censored survival time <i>T</i>. We allow for unobserved confounding variables, such that the error term of the regression model for <i>T</i> is dependent on the confounded variable <i>Z</i>. Moreover, <i>T</i> is subject to dependent censoring. This means that <i>T</i> is right censored by a censoring time <i>C</i>, which is dependent on <i>T</i> (even after conditioning out the effects of the measured covariates). A control function approach, relying on an instrumental variable, is leveraged to tackle the confounding issue. Further, it is assumed that <i>T</i> and <i>C</i> follow a joint regression model with bivariate Gaussian error terms and an unspecified covariance matrix, such that the dependent censoring can be handled in a flexible manner. Conditions under which the model is identifiable are given, a two-step estimation procedure is proposed, and it is shown that the resulting estimator is consistent and asymptotically normal. Simulations are used to confirm the validity and finite-sample performance of the estimation procedure. Finally, the proposed method is used to estimate the causal effect of job training programs on unemployment duration.</p>","PeriodicalId":51189,"journal":{"name":"Test","volume":"11 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2023-12-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138690258","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-12-13DOI: 10.1007/s11749-023-00907-5
Ian L. Dryden
The discussion focuses on the different choices that are made by the user in carrying out shape-based functional data analysis. First, there is the choice of an additional warping penalty that can be included in the procedure. An object-oriented data analysis approach can be useful for selecting such a warping penalty, and an example from monitoring peatland is given. Also, there is a choice to be made about whether the analysis is in a quotient manifold or an ambient space. There are advantages and disadvantages to either strategy, but in many examples, the results are similar due to a Laplace approximation. The final comment states that the authors provide plenty of convincing approaches with many useful insights. It is clear that the square root velocity function (SRVF) and transported SRVF methods will give solutions to many more problems in the future.
{"title":"Comments on: Shape-based functional data analysis","authors":"Ian L. Dryden","doi":"10.1007/s11749-023-00907-5","DOIUrl":"https://doi.org/10.1007/s11749-023-00907-5","url":null,"abstract":"<p>The discussion focuses on the different choices that are made by the user in carrying out shape-based functional data analysis. First, there is the choice of an additional warping penalty that can be included in the procedure. An object-oriented data analysis approach can be useful for selecting such a warping penalty, and an example from monitoring peatland is given. Also, there is a choice to be made about whether the analysis is in a quotient manifold or an ambient space. There are advantages and disadvantages to either strategy, but in many examples, the results are similar due to a Laplace approximation. The final comment states that the authors provide plenty of convincing approaches with many useful insights. It is clear that the square root velocity function (SRVF) and transported SRVF methods will give solutions to many more problems in the future.</p>","PeriodicalId":51189,"journal":{"name":"Test","volume":"129 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2023-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138628561","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-12-12DOI: 10.1007/s11749-023-00906-6
Paavo Sattler, Markus Pauly
Correlation matrices are an essential tool for investigating the dependency structures of random vectors or comparing them. We introduce an approach for testing a variety of null hypotheses that can be formulated based upon the correlation matrix. Examples cover MANOVA-type hypothesis of equal correlation matrices as well as testing for special correlation structures such as sphericity. Apart from existing fourth moments, our approach requires no other assumptions, allowing applications in various settings. To improve the small sample performance, a bootstrap technique is proposed and theoretically justified. Based on this, we also present a procedure to simultaneously test the hypotheses of equal correlation and equal covariance matrices. The performance of all new test statistics is compared with existing procedures through extensive simulations.
{"title":"Testing hypotheses about correlation matrices in general MANOVA designs","authors":"Paavo Sattler, Markus Pauly","doi":"10.1007/s11749-023-00906-6","DOIUrl":"https://doi.org/10.1007/s11749-023-00906-6","url":null,"abstract":"<p>Correlation matrices are an essential tool for investigating the dependency structures of random vectors or comparing them. We introduce an approach for testing a variety of null hypotheses that can be formulated based upon the correlation matrix. Examples cover MANOVA-type hypothesis of equal correlation matrices as well as testing for special correlation structures such as sphericity. Apart from existing fourth moments, our approach requires no other assumptions, allowing applications in various settings. To improve the small sample performance, a bootstrap technique is proposed and theoretically justified. Based on this, we also present a procedure to simultaneously test the hypotheses of equal correlation and equal covariance matrices. The performance of all new test statistics is compared with existing procedures through extensive simulations.\u0000</p>","PeriodicalId":51189,"journal":{"name":"Test","volume":"37 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2023-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138573921","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-12-11DOI: 10.1007/s11749-023-00902-w
Nicola Loperfido
Tensor eigenvectors naturally generalize matrix eigenvectors to multi-way arrays: eigenvectors of symmetric tensors of order k and dimension p are stationary points of polynomials of degree k in p variables on the unit sphere. Dominant eigenvectors of symmetric tensors maximize polynomials in several variables on the unit sphere, while base eigenvectors are roots of polynomials in several variables. In this paper, we focus on skewness-based projection pursuit and on third-order tensor eigenvectors, which provide the simplest, yet relevant connections between tensor eigenvectors and projection pursuit. Skewness-based projection pursuit finds interesting data projections using the dominant eigenvector of the sample third standardized cumulant to maximize skewness. Skewness-based projection pursuit also uses base eigenvectors of the sample third cumulant to remove skewness and facilitate the search for interesting data features other than skewness. Our contribution to the literature on tensor eigenvectors and on projection pursuit is twofold. Firstly, we show how skewness-based projection pursuit might be helpful in sequential cluster detection. Secondly, we show some asymptotic results regarding both dominant and base tensor eigenvectors of sample third cumulants. The practical relevance of the theoretical results is assessed with six well-known data sets.
张量特征向量自然地将矩阵特征向量概括为多向阵列:阶数为 k、维数为 p 的对称张量的特征向量是单位球上 p 个变量中 k 阶多项式的静止点。对称张量的主特征向量最大化了单位球上多个变量的多项式,而基特征向量则是多个变量的多项式的根。本文重点研究基于偏度的投影追寻和三阶张量特征向量,它们提供了张量特征向量和投影追寻之间最简单但又相关的联系。基于偏度的投影追寻使用样本三阶标准化累积的主导特征向量来最大化偏度,从而找到有趣的数据投影。基于偏度的投影追寻还使用样本第三累计量的基特征向量来消除偏度,便于寻找偏度以外的有趣数据特征。我们对有关张量特征向量和投影追寻的文献有两方面的贡献。首先,我们展示了基于偏度的投影追寻如何有助于顺序聚类检测。其次,我们展示了关于样本第三积的显性和基张量特征向量的一些渐近结果。我们用六个著名的数据集评估了这些理论结果的实用性。
{"title":"Tensor eigenvectors for projection pursuit","authors":"Nicola Loperfido","doi":"10.1007/s11749-023-00902-w","DOIUrl":"https://doi.org/10.1007/s11749-023-00902-w","url":null,"abstract":"<p>Tensor eigenvectors naturally generalize matrix eigenvectors to multi-way arrays: eigenvectors of symmetric tensors of order <i>k</i> and dimension <i>p</i> are stationary points of polynomials of degree <i>k</i> in <i>p</i> variables on the unit sphere. Dominant eigenvectors of symmetric tensors maximize polynomials in several variables on the unit sphere, while base eigenvectors are roots of polynomials in several variables. In this paper, we focus on skewness-based projection pursuit and on third-order tensor eigenvectors, which provide the simplest, yet relevant connections between tensor eigenvectors and projection pursuit. Skewness-based projection pursuit finds interesting data projections using the dominant eigenvector of the sample third standardized cumulant to maximize skewness. Skewness-based projection pursuit also uses base eigenvectors of the sample third cumulant to remove skewness and facilitate the search for interesting data features other than skewness. Our contribution to the literature on tensor eigenvectors and on projection pursuit is twofold. Firstly, we show how skewness-based projection pursuit might be helpful in sequential cluster detection. Secondly, we show some asymptotic results regarding both dominant and base tensor eigenvectors of sample third cumulants. The practical relevance of the theoretical results is assessed with six well-known data sets.</p>","PeriodicalId":51189,"journal":{"name":"Test","volume":"79 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2023-12-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138568899","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-12-11DOI: 10.1007/s11749-023-00904-8
Pedro Delicado
{"title":"Comments on: Shape-based functional data analysis","authors":"Pedro Delicado","doi":"10.1007/s11749-023-00904-8","DOIUrl":"https://doi.org/10.1007/s11749-023-00904-8","url":null,"abstract":"","PeriodicalId":51189,"journal":{"name":"Test","volume":"18 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2023-12-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138569102","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-12-09DOI: 10.1007/s11749-023-00910-w
Armine Bagyan, Donald Richards
For (d ge 2), let X be a random vector having a Bingham distribution on ({mathcal {S}}^{d-1}), the unit sphere centered at the origin in ({mathbb {R}}^d), and let (Sigma ) denote the symmetric matrix parameter of the distribution. Let (Psi (Sigma )) be the normalizing constant of the distribution and let (nabla Psi _d(Sigma )) be the matrix of first-order partial derivatives of (Psi (Sigma )) with respect to the entries of (Sigma ). We derive complete asymptotic expansions for (Psi (Sigma )) and (nabla Psi _d(Sigma )), as (d rightarrow infty ); these expansions are obtained subject to the growth condition that (Vert Sigma Vert ), the Frobenius norm of (Sigma ), satisfies (Vert Sigma Vert le gamma _0 d^{r/2}) for all d, where (gamma _0 > 0) and (r in [0,1)). Consequently, we obtain for the covariance matrix of X an asymptotic expansion up to terms of arbitrary degree in (Sigma ). Using a range of values of d that have appeared in a variety of applications of high-dimensional spherical data analysis, we tabulate the bounds on the remainder terms in the expansions of (Psi (Sigma )) and (nabla Psi _d(Sigma )) and we demonstrate the rapid convergence of the bounds to zero as r decreases.
对于(d ge 2 ),让 X 是一个在({mathcal {S}}^{d-1} )上有宾汉分布的随机向量,这个单位球以({mathbb {R}}^{d )中的原点为中心,让 ( ( (Sigma )表示分布的对称矩阵参数。让(Psi (Sigma ))是分布的归一化常数,让(nabla Psi _d(Sigma))是(Psi (Sigma ))关于(Sigma )的条目的一阶偏导数矩阵。我们推导出 (Psi (Sigma )) 和 (nabla Psi _d(Sigma))的完全渐近展开式为 (d rightarrow infty );对于所有的 d,这些展开式都满足一个增长条件,即 (Vert Sigma Vert ),即 (Sigma )的弗罗贝尼斯规范,其中 (gamma _0 > 0) 和 (r in [0,1)).因此,我们可以得到X的协方差矩阵在(Sigma )中任意度项的渐近展开。利用在高维球形数据分析的各种应用中出现的一系列 d 值,我们列出了 (Psi (Sigma )) 和 (nabla Psi _d(Sigma))的展开中余项的边界,并证明了随着 r 的减小,边界会迅速趋于零。
{"title":"Complete asymptotic expansions and the high-dimensional Bingham distributions","authors":"Armine Bagyan, Donald Richards","doi":"10.1007/s11749-023-00910-w","DOIUrl":"https://doi.org/10.1007/s11749-023-00910-w","url":null,"abstract":"<p>For <span>(d ge 2)</span>, let <i>X</i> be a random vector having a Bingham distribution on <span>({mathcal {S}}^{d-1})</span>, the unit sphere centered at the origin in <span>({mathbb {R}}^d)</span>, and let <span>(Sigma )</span> denote the symmetric matrix parameter of the distribution. Let <span>(Psi (Sigma ))</span> be the normalizing constant of the distribution and let <span>(nabla Psi _d(Sigma ))</span> be the matrix of first-order partial derivatives of <span>(Psi (Sigma ))</span> with respect to the entries of <span>(Sigma )</span>. We derive complete asymptotic expansions for <span>(Psi (Sigma ))</span> and <span>(nabla Psi _d(Sigma ))</span>, as <span>(d rightarrow infty )</span>; these expansions are obtained subject to the growth condition that <span>(Vert Sigma Vert )</span>, the Frobenius norm of <span>(Sigma )</span>, satisfies <span>(Vert Sigma Vert le gamma _0 d^{r/2})</span> for all <i>d</i>, where <span>(gamma _0 > 0)</span> and <span>(r in [0,1))</span>. Consequently, we obtain for the covariance matrix of <i>X</i> an asymptotic expansion up to terms of arbitrary degree in <span>(Sigma )</span>. Using a range of values of <i>d</i> that have appeared in a variety of applications of high-dimensional spherical data analysis, we tabulate the bounds on the remainder terms in the expansions of <span>(Psi (Sigma ))</span> and <span>(nabla Psi _d(Sigma ))</span> and we demonstrate the rapid convergence of the bounds to zero as <i>r</i> decreases.</p>","PeriodicalId":51189,"journal":{"name":"Test","volume":"34 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2023-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138561044","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-12-07DOI: 10.1007/s11749-023-00900-y
B. Cooper Boniece, Lajos Horváth, Peter M. Jacobs
We consider the problem of detecting distributional changes in a sequence of high dimensional data. Our approach combines two separate statistics stemming from (L_p) norms whose behavior is similar under (H_0) but potentially different under (H_A), leading to a testing procedure that that is flexible against a variety of alternatives. We establish the asymptotic distribution of our proposed test statistics separately in cases of weakly dependent and strongly dependent coordinates as (min {N,d}rightarrow infty ), where N denotes sample size and d is the dimension, and establish consistency of testing and estimation procedures in high dimensions under one-change alternative settings. Computational studies in single and multiple change point scenarios demonstrate our method can outperform other nonparametric approaches in the literature for certain alternatives in high dimensions. We illustrate our approach through an application to Twitter data concerning the mentions of U.S. governors.
{"title":"Change point detection in high dimensional data with U-statistics","authors":"B. Cooper Boniece, Lajos Horváth, Peter M. Jacobs","doi":"10.1007/s11749-023-00900-y","DOIUrl":"https://doi.org/10.1007/s11749-023-00900-y","url":null,"abstract":"<p>We consider the problem of detecting distributional changes in a sequence of high dimensional data. Our approach combines two separate statistics stemming from <span>(L_p)</span> norms whose behavior is similar under <span>(H_0)</span> but potentially different under <span>(H_A)</span>, leading to a testing procedure that that is flexible against a variety of alternatives. We establish the asymptotic distribution of our proposed test statistics separately in cases of weakly dependent and strongly dependent coordinates as <span>(min {N,d}rightarrow infty )</span>, where <i>N</i> denotes sample size and <i>d</i> is the dimension, and establish consistency of testing and estimation procedures in high dimensions under one-change alternative settings. Computational studies in single and multiple change point scenarios demonstrate our method can outperform other nonparametric approaches in the literature for certain alternatives in high dimensions. We illustrate our approach through an application to Twitter data concerning the mentions of U.S. governors.\u0000</p>","PeriodicalId":51189,"journal":{"name":"Test","volume":"3 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2023-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138558314","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-11-28DOI: 10.1007/s11749-023-00901-x
Almond Stöcker, Lisa Steyer, Sonja Greven
{"title":"Comments on: shape-based functional data analysis","authors":"Almond Stöcker, Lisa Steyer, Sonja Greven","doi":"10.1007/s11749-023-00901-x","DOIUrl":"https://doi.org/10.1007/s11749-023-00901-x","url":null,"abstract":"","PeriodicalId":51189,"journal":{"name":"Test","volume":"76 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2023-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138542877","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-11-21DOI: 10.1007/s11749-023-00891-w
Emilia Siviero, Emilie Chautru, Stephan Clémençon
In the Big Data era, with the ubiquity of geolocation sensors in particular, massive datasets exhibiting a possibly complex spatial dependence structure are becoming increasingly available. In this context, the standard probabilistic theory of statistical learning does not apply directly and guarantees of the generalization capacity of predictive rules learned from such data are left to establish. We analyze here the simple Kriging task, the flagship problem in Geostatistics, from a statistical learning perspective, i.e., by carrying out a nonparametric finite-sample predictive analysis. Given (dge 1) values taken by a realization of a square integrable random field (X={X_s}_{sin S}), (Ssubset {mathbb {R}}^2), with unknown covariance structure, at sites (s_1,; ldots ,; s_d) in S, the goal is to predict the unknown values it takes at any other location (sin S) with minimum quadratic risk. The prediction rule being derived from a training spatial dataset: a single realization (X') of X, is independent from those to be predicted, observed at (nge 1) locations (sigma _1,; ldots ,; sigma _n) in S. Despite the connection of this minimization problem with kernel ridge regression, establishing the generalization capacity of empirical risk minimizers is far from straightforward, due to the non-independent and identically distributed nature of the training data (X'_{sigma _1},; ldots ,; X'_{sigma _n}) involved in the learning procedure. In this article, non-asymptotic bounds of order (O_{{mathbb {P}}}(1/sqrt{n})) are proved for the excess risk of a plug-in predictive rule mimicking the true minimizer in the case of isotropic stationary Gaussian processes, observed at locations forming a regular grid in the learning stage. These theoretical results, as well as the role played by the technical conditions required to establish them, are illustrated by various numerical experiments, on simulated data and on real-world datasets, and hopefully pave the way for further developments in statistical learning based on spatial data.
{"title":"A statistical learning view of simple Kriging","authors":"Emilia Siviero, Emilie Chautru, Stephan Clémençon","doi":"10.1007/s11749-023-00891-w","DOIUrl":"https://doi.org/10.1007/s11749-023-00891-w","url":null,"abstract":"<p>In the Big Data era, with the ubiquity of geolocation sensors in particular, massive datasets exhibiting a possibly complex spatial dependence structure are becoming increasingly available. In this context, the standard probabilistic theory of statistical learning does not apply directly and guarantees of the generalization capacity of predictive rules learned from such data are left to establish. We analyze here the <i>simple Kriging</i> task, the flagship problem in Geostatistics, from a statistical learning perspective, i.e., by carrying out a nonparametric finite-sample predictive analysis. Given <span>(dge 1)</span> values taken by a realization of a square integrable random field <span>(X={X_s}_{sin S})</span>, <span>(Ssubset {mathbb {R}}^2)</span>, with unknown covariance structure, at sites <span>(s_1,; ldots ,; s_d)</span> in <i>S</i>, the goal is to predict the unknown values it takes at any other location <span>(sin S)</span> with minimum quadratic risk. The prediction rule being derived from a training spatial dataset: a single realization <span>(X')</span> of <i>X</i>, is independent from those to be predicted, observed at <span>(nge 1)</span> locations <span>(sigma _1,; ldots ,; sigma _n)</span> in <i>S</i>. Despite the connection of this minimization problem with kernel ridge regression, establishing the generalization capacity of empirical risk minimizers is far from straightforward, due to the non-independent and identically distributed nature of the training data <span>(X'_{sigma _1},; ldots ,; X'_{sigma _n})</span> involved in the learning procedure. In this article, non-asymptotic bounds of order <span>(O_{{mathbb {P}}}(1/sqrt{n}))</span> are proved for the excess risk of a <i>plug-in</i> predictive rule mimicking the true minimizer in the case of isotropic stationary Gaussian processes, observed at locations forming a regular grid in the learning stage. These theoretical results, as well as the role played by the technical conditions required to establish them, are illustrated by various numerical experiments, on simulated data and on real-world datasets, and hopefully pave the way for further developments in statistical learning based on spatial data.\u0000</p>","PeriodicalId":51189,"journal":{"name":"Test","volume":"4 7","pages":""},"PeriodicalIF":1.3,"publicationDate":"2023-11-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138525156","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-11-18DOI: 10.1007/s11749-023-00899-2
Ricardo Fraiman, Leonardo Moreno, Thomas Ransford
We address the problem of testing for the invariance of a probability measure under the action of a group of linear transformations. We propose a procedure based on consideration of one-dimensional projections, justified using a variant of the Cramér–Wold theorem. Our test procedure is powerful, computationally efficient, and dimension-independent, extending even to the case of infinite-dimensional spaces (multivariate functional data). It includes, as special cases, tests for exchangeability and sign-invariant exchangeability. We compare our procedure with some previous proposals in these cases, in a small simulation study. The paper concludes with two real-data examples.
{"title":"Application of the Cramér–Wold theorem to testing for invariance under group actions","authors":"Ricardo Fraiman, Leonardo Moreno, Thomas Ransford","doi":"10.1007/s11749-023-00899-2","DOIUrl":"https://doi.org/10.1007/s11749-023-00899-2","url":null,"abstract":"<p>We address the problem of testing for the invariance of a probability measure under the action of a group of linear transformations. We propose a procedure based on consideration of one-dimensional projections, justified using a variant of the Cramér–Wold theorem. Our test procedure is powerful, computationally efficient, and dimension-independent, extending even to the case of infinite-dimensional spaces (multivariate functional data). It includes, as special cases, tests for exchangeability and sign-invariant exchangeability. We compare our procedure with some previous proposals in these cases, in a small simulation study. The paper concludes with two real-data examples.</p>","PeriodicalId":51189,"journal":{"name":"Test","volume":"420 2","pages":""},"PeriodicalIF":1.3,"publicationDate":"2023-11-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138525170","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}