Pub Date : 2022-06-10DOI: 10.1007/s10182-022-00450-y
Ursula Berger, Göran Kauermann, Helmut Küchenhoff
The authors make an important contribution presenting a comprehensive and thoughtful overview about the many different aspects of data, statistics and data analyses in times of the recent COVID-19 pandemic discussing all relevant topics. The paper certainly provides a very valuable reflection of what has been done, what could have been done and what needs to be done. We contribute here with a few comments and some additional issues. We do not discuss all chapters of Jahn et al. (AStA Adv Stat Anal, 2022. 10.1007/s10182-022-00439-7), but focus on those where our personal views and experiences might add some additional aspects.
作者做出了重要贡献,对最近COVID-19大流行时期的数据、统计和数据分析的许多不同方面进行了全面和深思熟虑的概述,讨论了所有相关主题。对于已经做了什么、本可以做什么以及需要做什么,这份报告无疑提供了非常有价值的反映。我们在这里提出一些意见和一些附加问题。我们不讨论Jahn等人的所有章节(astv Stat Anal, 2022)。10.1007/s10182-022-00439-7),但重点关注那些我们个人的观点和经验可能会增加一些额外的方面。
{"title":"Discussion on On the role of data, statistics and decisions in a pandemic","authors":"Ursula Berger, Göran Kauermann, Helmut Küchenhoff","doi":"10.1007/s10182-022-00450-y","DOIUrl":"10.1007/s10182-022-00450-y","url":null,"abstract":"<div><p>The authors make an important contribution presenting a comprehensive and thoughtful overview about the many different aspects of data, statistics and data analyses in times of the recent COVID-19 pandemic discussing all relevant topics. The paper certainly provides a very valuable reflection of what has been done, what could have been done and what needs to be done. We contribute here with a few comments and some additional issues. We do not discuss all chapters of Jahn et al. (AStA Adv Stat Anal, 2022. 10.1007/s10182-022-00439-7), but focus on those where our personal views and experiences might add some additional aspects.</p></div>","PeriodicalId":55446,"journal":{"name":"Asta-Advances in Statistical Analysis","volume":"106 3","pages":"387 - 390"},"PeriodicalIF":1.4,"publicationDate":"2022-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10182-022-00450-y.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"50017885","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-06-09DOI: 10.1007/s10182-022-00449-5
Sebastian Contreras, Jonas Dehning, Viola Priesemann
{"title":"Describing a landscape we are yet discovering","authors":"Sebastian Contreras, Jonas Dehning, Viola Priesemann","doi":"10.1007/s10182-022-00449-5","DOIUrl":"10.1007/s10182-022-00449-5","url":null,"abstract":"","PeriodicalId":55446,"journal":{"name":"Asta-Advances in Statistical Analysis","volume":"106 3","pages":"399 - 402"},"PeriodicalIF":1.4,"publicationDate":"2022-06-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10182-022-00449-5.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"50035658","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-06-02DOI: 10.1007/s10182-022-00448-6
Rodolfo Metulini, Giorgio Gnecco, Francesco Biancalani, Massimo Riccaboni
Multi-regional input–output (I/O) matrices provide the networks of within- and cross-country economic relations. In the context of I/O analysis, the methodology adopted by national statistical offices in data collection raises the issue of obtaining reliable data in a timely fashion and it makes the reconstruction of (parts of) the I/O matrices of particular interest. In this work, we propose a method combining hierarchical clustering and matrix completion with a LASSO-like nuclear norm penalty, to predict missing entries of a partially unknown I/O matrix. Through analyses based on both real-world and synthetic I/O matrices, we study the effectiveness of the proposed method to predict missing values from both previous years data and current data related to countries similar to the one for which current data are obscured. To show the usefulness of our method, an application based on World Input–Output Database (WIOD) tables—which are an example of industry-by-industry I/O tables—is provided. Strong similarities in structure between WIOD and other I/O tables are also found, which make the proposed approach easily generalizable to them.
{"title":"Hierarchical clustering and matrix completion for the reconstruction of world input–output tables","authors":"Rodolfo Metulini, Giorgio Gnecco, Francesco Biancalani, Massimo Riccaboni","doi":"10.1007/s10182-022-00448-6","DOIUrl":"10.1007/s10182-022-00448-6","url":null,"abstract":"<div><p>Multi-regional input–output (I/O) matrices provide the networks of within- and cross-country economic relations. In the context of I/O analysis, the methodology adopted by national statistical offices in data collection raises the issue of obtaining reliable data in a timely fashion and it makes the reconstruction of (parts of) the I/O matrices of particular interest. In this work, we propose a method combining hierarchical clustering and matrix completion with a LASSO-like nuclear norm penalty, to predict missing entries of a partially unknown I/O matrix. Through analyses based on both real-world and synthetic I/O matrices, we study the effectiveness of the proposed method to predict missing values from both previous years data and current data related to countries similar to the one for which current data are obscured. To show the usefulness of our method, an application based on World Input–Output Database (WIOD) tables—which are an example of industry-by-industry I/O tables—is provided. Strong similarities in structure between WIOD and other I/O tables are also found, which make the proposed approach easily generalizable to them.</p></div>","PeriodicalId":55446,"journal":{"name":"Asta-Advances in Statistical Analysis","volume":"107 3","pages":"575 - 620"},"PeriodicalIF":1.4,"publicationDate":"2022-06-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10182-022-00448-6.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"50004745","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-05-21DOI: 10.1007/s10182-022-00447-7
Walter J. Radermacher
In the Corona pandemic, it became clear with burning clarity how much good quality statistics are needed, and at the same time how unsuccessful we are at providing such statistics despite the existing technical and methodological possibilities and diverse data sources. It is therefore more than overdue to get to the bottom of the causes of these issues and to learn from the findings. This defines a high aspiration, namely that firstly a diagnosis is carried out in which the causes of the deficiencies with their interactions are identified as broadly as possible. Secondly, such a broad diagnosis should result in a therapy that includes a coherent strategy that can be generalised, i.e. that goes beyond the Corona pandemic.
{"title":"Comment on: On the role of data, statistics and decisions in a pandemic statistics for climate protection and health—dare (more) progress!","authors":"Walter J. Radermacher","doi":"10.1007/s10182-022-00447-7","DOIUrl":"10.1007/s10182-022-00447-7","url":null,"abstract":"<div><p>In the Corona pandemic, it became clear with burning clarity how much good quality statistics are needed, and at the same time how unsuccessful we are at providing such statistics despite the existing technical and methodological possibilities and diverse data sources. It is therefore more than overdue to get to the bottom of the causes of these issues and to learn from the findings. This defines a high aspiration, namely that firstly a diagnosis is carried out in which the causes of the deficiencies with their interactions are identified as broadly as possible. Secondly, such a broad diagnosis should result in a therapy that includes a coherent strategy that can be generalised, i.e. that goes beyond the Corona pandemic.</p></div>","PeriodicalId":55446,"journal":{"name":"Asta-Advances in Statistical Analysis","volume":"106 3","pages":"391 - 397"},"PeriodicalIF":1.4,"publicationDate":"2022-05-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10182-022-00447-7.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"50041913","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-05-11DOI: 10.1007/s10182-022-00446-8
Angel G. Angelov, Magnus Ekström
The paper explores a testing problem which involves four hypotheses, that is, based on observations of two random variables X and Y, we wish to discriminate between four possibilities: identical survival functions, stochastic dominance of X over Y, stochastic dominance of Y over X, or crossing survival functions. Four-decision testing procedures for repeated measurements data are proposed. The tests are based on a permutation approach and do not rely on distributional assumptions. One-sided versions of the Cramér–von Mises, Anderson–Darling, and Kolmogorov–Smirnov statistics are utilized. The consistency of the tests is proven. A simulation study shows good power properties and control of false-detection errors. The suggested tests are applied to data from a psychophysical experiment.
{"title":"Tests of stochastic dominance with repeated measurements data","authors":"Angel G. Angelov, Magnus Ekström","doi":"10.1007/s10182-022-00446-8","DOIUrl":"10.1007/s10182-022-00446-8","url":null,"abstract":"<div><p>The paper explores a testing problem which involves four hypotheses, that is, based on observations of two random variables <i>X</i> and <i>Y</i>, we wish to discriminate between four possibilities: identical survival functions, stochastic dominance of <i>X</i> over <i>Y</i>, stochastic dominance of <i>Y</i> over <i>X</i>, or crossing survival functions. Four-decision testing procedures for repeated measurements data are proposed. The tests are based on a permutation approach and do not rely on distributional assumptions. One-sided versions of the Cramér–von Mises, Anderson–Darling, and Kolmogorov–Smirnov statistics are utilized. The consistency of the tests is proven. A simulation study shows good power properties and control of false-detection errors. The suggested tests are applied to data from a psychophysical experiment.</p></div>","PeriodicalId":55446,"journal":{"name":"Asta-Advances in Statistical Analysis","volume":"107 3","pages":"443 - 467"},"PeriodicalIF":1.4,"publicationDate":"2022-05-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10182-022-00446-8.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43319818","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A myriad of physical, biological and other phenomena are better modeled with semi-infinite distribution families, in which case not knowing the population minimum becomes a hassle when performing parametric inference. Ad hoc methods to deal with this problem exist, but are suboptimal and sometimes unfeasible. Besides, having the statistician handcraft solutions in a case-by-case basis is counterproductive. In this paper, we propose a framework under which the issue can be analyzed, and perform an extensive search in the literature for methods that could be used to solve the aforementioned problem; we also propose a method of our own. Simulation experiments were then performed to compare some methods from the literature and our proposal. We found that the straightforward method, which is to infer the population minimum by maximum likelihood, has severe difficulty in giving a good estimate for the population minimum, but manages to achieve very good inferred models. The other methods, including our proposal, involve estimating the population minimum, and we found that our method is superior to the other methods of this kind, considering the distributions simulated, followed very closely by the endpoint estimator by Alves et al. (Stat Sin 24(4):1811–1835, 2014). Although these two give much more accurate estimates for the population minimum, the straightforward method also displays some advantages, so choosing between these three methods will depend on the problem domain.
无数的物理、生物和其他现象可以用半无限分布族更好地建模,在这种情况下,不知道总体最小值在执行参数推理时变得很麻烦。处理这个问题的特别方法是存在的,但不是最优的,有时是不可行的。此外,让统计学家在个案的基础上手工制作解决方案会适得其反。在本文中,我们提出了一个可以分析问题的框架,并在文献中进行了广泛的搜索,以寻找可用于解决上述问题的方法;我们也提出了自己的方法。然后进行了仿真实验,比较了文献中的一些方法和我们的建议。我们发现,直接的方法,即通过最大似然推断总体最小值,在给出总体最小值的良好估计方面存在严重困难,但可以获得非常好的推断模型。其他方法,包括我们的建议,涉及估计总体最小值,我们发现,考虑到模拟的分布,我们的方法优于同类的其他方法,紧随其后的是Alves等人的端点估计器(Stat Sin 24(4): 1811-1835, 2014)。尽管这两种方法对总体最小值给出了更准确的估计,但直接的方法也显示出一些优势,因此在这三种方法之间进行选择将取决于问题领域。
{"title":"On dealing with the unknown population minimum in parametric inference","authors":"Matheus Henrique Junqueira Saldanha, Adriano Kamimura Suzuki","doi":"10.1007/s10182-022-00445-9","DOIUrl":"10.1007/s10182-022-00445-9","url":null,"abstract":"<div><p>A myriad of physical, biological and other phenomena are better modeled with semi-infinite distribution families, in which case not knowing the population minimum becomes a hassle when performing parametric inference. Ad hoc methods to deal with this problem exist, but are suboptimal and sometimes unfeasible. Besides, having the statistician handcraft solutions in a case-by-case basis is counterproductive. In this paper, we propose a framework under which the issue can be analyzed, and perform an extensive search in the literature for methods that could be used to solve the aforementioned problem; we also propose a method of our own. Simulation experiments were then performed to compare some methods from the literature and our proposal. We found that the straightforward method, which is to infer the population minimum by maximum likelihood, has severe difficulty in giving a good estimate for the population minimum, but manages to achieve very good inferred models. The other methods, including our proposal, involve estimating the population minimum, and we found that our method is superior to the other methods of this kind, considering the distributions simulated, followed very closely by the endpoint estimator by Alves et al. (Stat Sin 24(4):1811–1835, 2014). Although these two give much more accurate estimates for the population minimum, the straightforward method also displays some advantages, so choosing between these three methods will depend on the problem domain.</p></div>","PeriodicalId":55446,"journal":{"name":"Asta-Advances in Statistical Analysis","volume":"107 3","pages":"509 - 535"},"PeriodicalIF":1.4,"publicationDate":"2022-05-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43197145","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this paper, we propose the use of advanced and flexible statistical models to describe the spatial displacement of earthquake data. The paper aims to account for the external geological information in the description of complex seismic point processes, through the estimation of models with space varying parameters. A local version of the Log-Gaussian Cox processes (LGCP) is introduced and applied for the first time, exploiting the inferential tools in Baddeley (Spat Stat 22:261–295, 2017), estimating the model by the local Palm likelihood. We provide methods and approaches accounting for the interaction among points, typically described by LGCP models through the estimation of the covariance parameters of the Gaussian Random Field, that in this local version are allowed to vary in space, providing a more realistic description of the clustering feature of seismic events. Furthermore, we contribute to the framework of diagnostics, outlining suitable methods for the local context and proposing a new step-wise approach addressing the particular case of multiple covariates. Overall, we show that local models provide good inferential results and could serve as the basis for future spatio-temporal local model developments, peculiar for the description of the complex seismic phenomenon.
在本文中,我们建议使用先进和灵活的统计模型来描述地震数据的空间位移。本文旨在通过空间变参数模型的估计,在复杂地震点过程的描述中考虑外部地质信息。引入并首次应用了局部版本的log -高斯Cox过程(LGCP),利用Baddeley (Spat Stat 22:26 - 295, 2017)中的推理工具,通过局部Palm似然估计模型。我们提供了考虑点之间相互作用的方法和途径,通常由LGCP模型通过估计高斯随机场的协方差参数来描述,在这个局部版本中,这些参数允许在空间上变化,从而更真实地描述地震事件的聚类特征。此外,我们为诊断框架做出了贡献,概述了适合当地情况的方法,并提出了一种新的逐步方法来解决多协变量的特殊情况。总的来说,我们表明局部模型提供了良好的推理结果,可以作为未来时空局部模型发展的基础,对于复杂地震现象的描述是特殊的。
{"title":"Local spatial log-Gaussian Cox processes for seismic data","authors":"Nicoletta D’Angelo, Marianna Siino, Antonino D’Alessandro, Giada Adelfio","doi":"10.1007/s10182-022-00444-w","DOIUrl":"10.1007/s10182-022-00444-w","url":null,"abstract":"<div><p>In this paper, we propose the use of advanced and flexible statistical models to describe the spatial displacement of earthquake data. The paper aims to account for the external geological information in the description of complex seismic point processes, through the estimation of models with space varying parameters. A local version of the Log-Gaussian Cox processes (LGCP) is introduced and applied for the first time, exploiting the inferential tools in Baddeley (Spat Stat 22:261–295, 2017), estimating the model by the local Palm likelihood. We provide methods and approaches accounting for the interaction among points, typically described by LGCP models through the estimation of the covariance parameters of the Gaussian Random Field, that in this local version are allowed to vary in space, providing a more realistic description of the clustering feature of seismic events. Furthermore, we contribute to the framework of diagnostics, outlining suitable methods for the local context and proposing a new step-wise approach addressing the particular case of multiple covariates. Overall, we show that local models provide good inferential results and could serve as the basis for future spatio-temporal local model developments, peculiar for the description of the complex seismic phenomenon.</p></div>","PeriodicalId":55446,"journal":{"name":"Asta-Advances in Statistical Analysis","volume":"106 4","pages":"633 - 671"},"PeriodicalIF":1.4,"publicationDate":"2022-04-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10182-022-00444-w.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44906683","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-04-14DOI: 10.1007/s10182-022-00442-y
Claudio Giovanni Borroni, Lucio De Capitani
This paper deals with the estimation of kurtosis on large datasets. It aims at overcoming two frequent limitations in applications: first, Pearson's standardized fourth moment is computed as a unique measure of kurtosis; second, the fact that data might be just samples is neglected, so that the opportunity of using suitable inferential tools, like standard errors and confidence intervals, is discarded. In the paper, some recent indexes of kurtosis are reviewed as alternatives to Pearson’s standardized fourth moment. The asymptotic distribution of their natural estimators is derived, and it is used as a tool to evaluate efficiency and to build confidence intervals. A simulation study is also conducted to provide practical indications about the choice of a suitable index. As a conclusion, researchers are warned against the use of classical Pearson’s index when the sample size is too low and/or the distribution is skewed and/or heavy-tailed. Specifically, the occurrence of heavy tails can deprive Pearson’s index of any meaning or produce unreliable confidence intervals. However, such limitations can be overcome by reverting to the reviewed alternative indexes, relying just on low-order moments.
{"title":"Some measures of kurtosis and their inference on large datasets","authors":"Claudio Giovanni Borroni, Lucio De Capitani","doi":"10.1007/s10182-022-00442-y","DOIUrl":"10.1007/s10182-022-00442-y","url":null,"abstract":"<div><p>This paper deals with the estimation of kurtosis on large datasets. It aims at overcoming two frequent limitations in applications: first, Pearson's standardized fourth moment is computed as a unique measure of kurtosis; second, the fact that data might be just samples is neglected, so that the opportunity of using suitable inferential tools, like standard errors and confidence intervals, is discarded. In the paper, some recent indexes of kurtosis are reviewed as alternatives to Pearson’s standardized fourth moment. The asymptotic distribution of their natural estimators is derived, and it is used as a tool to evaluate efficiency and to build confidence intervals. A simulation study is also conducted to provide practical indications about the choice of a suitable index. As a conclusion, researchers are warned against the use of classical Pearson’s index when the sample size is too low and/or the distribution is skewed and/or heavy-tailed. Specifically, the occurrence of heavy tails can deprive Pearson’s index of any meaning or produce unreliable confidence intervals. However, such limitations can be overcome by reverting to the reviewed alternative indexes, relying just on low-order moments.</p></div>","PeriodicalId":55446,"journal":{"name":"Asta-Advances in Statistical Analysis","volume":"106 4","pages":"573 - 607"},"PeriodicalIF":1.4,"publicationDate":"2022-04-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10182-022-00442-y.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42575110","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
External preference mapping is widely used in marketing and R&D divisions to understand the consumer behaviour. The most common preference map is obtained through a two-step procedure that combines principal component analysis and least squares regression. The standard approach exploits classical regression and therefore focuses on the conditional mean. This paper proposes the use of quantile regression to enrich the preference map looking at the whole distribution of the consumer preference. The enriched maps highlight possible different consumer behaviour with respect to the less or most preferred products. This is pursued by exploring the variability of liking along the principal components as well as focusing on the direction of preference. The use of different aesthetics (colours, shapes, size, arrows) equips standard preference map with additional information and does not force the user to change the standard tool she/he is used to. The proposed methodology is shown in action on a case study pertaining yogurt preferences.
{"title":"A quantile regression perspective on external preference mapping","authors":"Cristina Davino, Tormod Næs, Rosaria Romano, Domenico Vistocco","doi":"10.1007/s10182-022-00440-0","DOIUrl":"10.1007/s10182-022-00440-0","url":null,"abstract":"<div><p>External preference mapping is widely used in marketing and R&D divisions to understand the consumer behaviour. The most common preference map is obtained through a two-step procedure that combines principal component analysis and least squares regression. The standard approach exploits classical regression and therefore focuses on the conditional mean. This paper proposes the use of quantile regression to enrich the preference map looking at the whole distribution of the consumer preference. The enriched maps highlight possible different consumer behaviour with respect to the less or most preferred products. This is pursued by exploring the variability of liking along the principal components as well as focusing on the direction of preference. The use of different aesthetics (colours, shapes, size, arrows) equips standard preference map with additional information and does not force the user to change the standard tool she/he is used to. The proposed methodology is shown in action on a case study pertaining yogurt preferences.</p></div>","PeriodicalId":55446,"journal":{"name":"Asta-Advances in Statistical Analysis","volume":"106 4","pages":"545 - 571"},"PeriodicalIF":1.4,"publicationDate":"2022-04-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10182-022-00440-0.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44379862","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-04-08DOI: 10.1007/s10182-022-00443-x
Wanling Xie, Hu Yang
In this work, we propose a novel group selection method called Group Square-Root Elastic Net. It is based on square-root regularization with a group elastic net penalty, i.e., a (ell _{2,1}+ell _2) penalty. As a type of square-root-based procedure, one distinct feature is that the estimator is independent of the unknown noise level (sigma ), which is non-trivial to estimate under the high-dimensional setting, especially when (pgg n). In many applications, the estimator is expected to be sparse, not in an irregular way, but rather in a structured manner. It makes the proposed method very attractive to tackle both high-dimensionality and structured sparsity. We study the correct subset recovery under a Group Elastic Net Irrepresentable Condition. Both the slow rate bounds and fast rate bounds are established, the latter under the Restricted Eigenvalue assumption and Gaussian noise assumption. To implement, a fast algorithm based on the scaled multivariate thresholding-based iterative selection idea is introduced with proved convergence. A comparative study examines the superiority of our approach against alternatives.
{"title":"Group sparse recovery via group square-root elastic net and the iterative multivariate thresholding-based algorithm","authors":"Wanling Xie, Hu Yang","doi":"10.1007/s10182-022-00443-x","DOIUrl":"10.1007/s10182-022-00443-x","url":null,"abstract":"<div><p>In this work, we propose a novel group selection method called Group Square-Root Elastic Net. It is based on square-root regularization with a group elastic net penalty, i.e., a <span>(ell _{2,1}+ell _2)</span> penalty. As a type of square-root-based procedure, one distinct feature is that the estimator is independent of the unknown noise level <span>(sigma )</span>, which is non-trivial to estimate under the high-dimensional setting, especially when <span>(pgg n)</span>. In many applications, the estimator is expected to be sparse, not in an irregular way, but rather in a structured manner. It makes the proposed method very attractive to tackle both high-dimensionality and structured sparsity. We study the correct subset recovery under a Group Elastic Net Irrepresentable Condition. Both the slow rate bounds and fast rate bounds are established, the latter under the Restricted Eigenvalue assumption and Gaussian noise assumption. To implement, a fast algorithm based on the scaled multivariate thresholding-based iterative selection idea is introduced with proved convergence. A comparative study examines the superiority of our approach against alternatives.</p></div>","PeriodicalId":55446,"journal":{"name":"Asta-Advances in Statistical Analysis","volume":"107 3","pages":"469 - 507"},"PeriodicalIF":1.4,"publicationDate":"2022-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49272710","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}