首页 > 最新文献

Asta-Advances in Statistical Analysis最新文献

英文 中文
On the role of data, statistics and decisions in a pandemic 关于数据、统计和决策在大流行中的作用
IF 1.4 4区 数学 Q2 STATISTICS & PROBABILITY Pub Date : 2022-04-07 DOI: 10.1007/s10182-022-00439-7
Beate Jahn, Sarah Friedrich, Joachim Behnke, Joachim Engel, Ursula Garczarek, Ralf Münnich, Markus Pauly, Adalbert Wilhelm, Olaf Wolkenhauer, Markus Zwick, Uwe Siebert, Tim Friede

A pandemic poses particular challenges to decision-making because of the need to continuously adapt decisions to rapidly changing evidence and available data. For example, which countermeasures are appropriate at a particular stage of the pandemic? How can the severity of the pandemic be measured? What is the effect of vaccination in the population and which groups should be vaccinated first? The process of decision-making starts with data collection and modeling and continues to the dissemination of results and the subsequent decisions taken. The goal of this paper is to give an overview of this process and to provide recommendations for the different steps from a statistical perspective. In particular, we discuss a range of modeling techniques including mathematical, statistical and decision-analytic models along with their applications in the COVID-19 context. With this overview, we aim to foster the understanding of the goals of these modeling approaches and the specific data requirements that are essential for the interpretation of results and for successful interdisciplinary collaborations. A special focus is on the role played by data in these different models, and we incorporate into the discussion the importance of statistical literacy and of effective dissemination and communication of findings.

大流行病对决策构成了特别的挑战,因为需要不断调整决策,以适应迅速变化的证据和现有数据。例如,在大流行的特定阶段,哪些对策是适当的?如何衡量大流行的严重程度?疫苗接种对人群的影响是什么?哪些人群应该首先接种疫苗?决策过程始于数据收集和建模,并继续传播结果和随后作出的决定。本文的目的是概述这一过程,并从统计的角度为不同的步骤提供建议。我们特别讨论了一系列建模技术,包括数学、统计和决策分析模型,以及它们在COVID-19背景下的应用。通过这一概述,我们的目标是促进对这些建模方法的目标的理解,以及对结果解释和成功的跨学科合作至关重要的特定数据需求。我们将特别关注数据在这些不同模型中所起的作用,并将统计素养以及有效传播和交流研究结果的重要性纳入讨论。
{"title":"On the role of data, statistics and decisions in a pandemic","authors":"Beate Jahn,&nbsp;Sarah Friedrich,&nbsp;Joachim Behnke,&nbsp;Joachim Engel,&nbsp;Ursula Garczarek,&nbsp;Ralf Münnich,&nbsp;Markus Pauly,&nbsp;Adalbert Wilhelm,&nbsp;Olaf Wolkenhauer,&nbsp;Markus Zwick,&nbsp;Uwe Siebert,&nbsp;Tim Friede","doi":"10.1007/s10182-022-00439-7","DOIUrl":"10.1007/s10182-022-00439-7","url":null,"abstract":"<div><p>A pandemic poses particular challenges to decision-making because of the need to continuously adapt decisions to rapidly changing evidence and available data. For example, which countermeasures are appropriate at a particular stage of the pandemic? How can the severity of the pandemic be measured? What is the effect of vaccination in the population and which groups should be vaccinated first? The process of decision-making starts with data collection and modeling and continues to the dissemination of results and the subsequent decisions taken. The goal of this paper is to give an overview of this process and to provide recommendations for the different steps from a statistical perspective. In particular, we discuss a range of modeling techniques including mathematical, statistical and decision-analytic models along with their applications in the COVID-19 context. With this overview, we aim to foster the understanding of the goals of these modeling approaches and the specific data requirements that are essential for the interpretation of results and for successful interdisciplinary collaborations. A special focus is on the role played by data in these different models, and we incorporate into the discussion the importance of statistical literacy and of effective dissemination and communication of findings.</p></div>","PeriodicalId":55446,"journal":{"name":"Asta-Advances in Statistical Analysis","volume":"106 3","pages":"349 - 382"},"PeriodicalIF":1.4,"publicationDate":"2022-04-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10182-022-00439-7.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"50012409","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
Imputation-based empirical likelihood inferences for partially nonlinear quantile regression models with missing responses 缺失响应部分非线性分位数回归模型的基于假设的经验似然推断
IF 1.4 4区 数学 Q2 STATISTICS & PROBABILITY Pub Date : 2022-04-06 DOI: 10.1007/s10182-022-00441-z
Xiaoshuang Zhou, Peixin Zhao, Yujie Gai

In this paper, we consider the confidence interval construction for the partially nonlinear models with missing responses at random under the framework of quantile regression. We propose an imputation-based empirical likelihood method to construct statistical inferences for both the unknown parametric vector in the nonlinear function and the nonparametric function and show that the proposed empirical log-likelihood ratios are both asymptotically chi-squared in theory. Furthermore, the confidence region for the parametric vector and the pointwise confidence interval for the nonparametric function are constructed. Some simulation studies are implemented to assess the performances of the proposed estimation method, and simulation results indicate that the proposed method is workable.

本文研究了在分位数回归框架下随机缺失响应的部分非线性模型的置信区间构造问题。我们提出了一种基于假设的经验似然方法来构造非线性函数和非参数函数中未知参数向量的统计推断,并证明了所提出的经验对数似然比在理论上都是渐近卡方的。进一步构造了参数向量的置信域和非参数函数的逐点置信区间。通过仿真研究对所提估计方法的性能进行了评估,仿真结果表明所提方法是可行的。
{"title":"Imputation-based empirical likelihood inferences for partially nonlinear quantile regression models with missing responses","authors":"Xiaoshuang Zhou,&nbsp;Peixin Zhao,&nbsp;Yujie Gai","doi":"10.1007/s10182-022-00441-z","DOIUrl":"10.1007/s10182-022-00441-z","url":null,"abstract":"<div><p>In this paper, we consider the confidence interval construction for the partially nonlinear models with missing responses at random under the framework of quantile regression. We propose an imputation-based empirical likelihood method to construct statistical inferences for both the unknown parametric vector in the nonlinear function and the nonparametric function and show that the proposed empirical log-likelihood ratios are both asymptotically chi-squared in theory. Furthermore, the confidence region for the parametric vector and the pointwise confidence interval for the nonparametric function are constructed. Some simulation studies are implemented to assess the performances of the proposed estimation method, and simulation results indicate that the proposed method is workable.</p></div>","PeriodicalId":55446,"journal":{"name":"Asta-Advances in Statistical Analysis","volume":"106 4","pages":"705 - 722"},"PeriodicalIF":1.4,"publicationDate":"2022-04-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10182-022-00441-z.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42286502","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Correction to: Assessment of agricultural sustainability in European Union countries: a group-based multivariate trajectory approach 修正:欧盟国家农业可持续性评估:基于群体的多元轨迹方法
IF 1.4 4区 数学 Q2 STATISTICS & PROBABILITY Pub Date : 2022-03-17 DOI: 10.1007/s10182-022-00438-8
Alessandro Magrini
{"title":"Correction to: Assessment of agricultural sustainability in European Union countries: a group-based multivariate trajectory approach","authors":"Alessandro Magrini","doi":"10.1007/s10182-022-00438-8","DOIUrl":"10.1007/s10182-022-00438-8","url":null,"abstract":"","PeriodicalId":55446,"journal":{"name":"Asta-Advances in Statistical Analysis","volume":"106 3","pages":"525 - 526"},"PeriodicalIF":1.4,"publicationDate":"2022-03-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10182-022-00438-8.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43623715","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
On the Gaussian representation of the Riesz probability distribution on symmetric matrices 对称矩阵上Riesz概率分布的高斯表示
IF 1.4 4区 数学 Q2 STATISTICS & PROBABILITY Pub Date : 2022-03-06 DOI: 10.1007/s10182-022-00436-w
Abdelhamid Hassairi, Fatma Ktari, Raoudha Zine

The Riesz probability distribution on symmetric matrices represents an important extension of the Wishart distribution. It is defined by its Laplace transform involving the notion of generalized power. Based on the fact that some Wishart distributions are presented by the mean of the multivariate Gaussian distribution, it is shown that some Riesz probability distributions which are not necessarily Wishart are also presented by the mean of Gaussian samples with missing data. As a corollary, we deduce a Gaussian representation of the inverse Riesz distribution and we give its expectation. The results are assessed in simulation studies.

对称矩阵上的Riesz概率分布是Wishart分布的一个重要推广。它是由广义幂的拉普拉斯变换定义的。基于一些Wishart分布是由多元高斯分布的均值表示的事实,证明了一些不一定是Wishart的Riesz概率分布也可以由缺失数据的高斯样本的均值表示。作为推论,我们推导出逆Riesz分布的高斯表示,并给出了它的期望。在模拟研究中对结果进行了评估。
{"title":"On the Gaussian representation of the Riesz probability distribution on symmetric matrices","authors":"Abdelhamid Hassairi,&nbsp;Fatma Ktari,&nbsp;Raoudha Zine","doi":"10.1007/s10182-022-00436-w","DOIUrl":"10.1007/s10182-022-00436-w","url":null,"abstract":"<div><p>The Riesz probability distribution on symmetric matrices represents an important extension of the Wishart distribution. It is defined by its Laplace transform involving the notion of generalized power. Based on the fact that some Wishart distributions are presented by the mean of the multivariate Gaussian distribution, it is shown that some Riesz probability distributions which are not necessarily Wishart are also presented by the mean of Gaussian samples with missing data. As a corollary, we deduce a Gaussian representation of the inverse Riesz distribution and we give its expectation. The results are assessed in simulation studies.</p></div>","PeriodicalId":55446,"journal":{"name":"Asta-Advances in Statistical Analysis","volume":"106 4","pages":"609 - 632"},"PeriodicalIF":1.4,"publicationDate":"2022-03-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10182-022-00436-w.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43728996","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Assessment of agricultural sustainability in European Union countries: a group-based multivariate trajectory approach 欧盟国家农业可持续性评估:基于群体的多元轨迹方法
IF 1.4 4区 数学 Q2 STATISTICS & PROBABILITY Pub Date : 2022-03-05 DOI: 10.1007/s10182-022-00437-9
Alessandro Magrini

Sustainability of agriculture is difficult to measure and assess because it is a multidimensional concept that involves economic, social and environmental aspects and is subjected to temporal evolution and geographical differences. Existing studies assessing agricultural sustainability in the European Union (EU) are affected by several shortcomings that limit their relevance for policy makers. Specifically, most of them focus on farm level or cover a small set of countries, and the few exceptions covering a broad set of countries consider only a subset of the sustainable dimensions or rely on cross-sectional data. In this paper, we consider yearly data on 12 indicators (5 for the economic, 3 for the social and 4 for the environmental dimension) measured on 26 EU countries in the period 2004–2018 (15 years), and apply group-based multivariate trajectory modeling to identify groups of countries with common trends of sustainable objectives. An expectation-maximization algorithm is proposed to perform maximum likelihood estimation from incomplete data without relying on an explicit imputation procedure. Our results highlight three groups of countries with distinguished strong and weak sustainable objectives. Strong objectives common to all the three groups include improvement of productivity, increase of personal income in rural areas, reduction of poverty in rural areas, increase of production of renewable energy, rise of organic farming and reduction of nitrogen balance. Instead, enhancement of manager turnover and reduction of greenhouse gas emissions are weak objectives common to all the three groups of countries. Our findings represent a valuable resource to formulate new schemes for the attribution of subsidies within the Common Agricultural Policy (CAP).

农业的可持续性难以衡量和评估,因为它是一个涉及经济、社会和环境方面的多维概念,并受到时间演变和地理差异的影响。评估欧盟农业可持续性的现有研究受到若干缺陷的影响,限制了它们与决策者的相关性。具体而言,其中大多数侧重于农场一级或涵盖一小部分国家,少数例外情况涵盖广泛的国家,仅考虑可持续层面的一个子集或依赖横截面数据。在本文中,我们考虑了2004-2018年(15年)期间对26个欧盟国家测量的12个指标(5个经济指标,3个社会指标和4个环境指标)的年度数据,并应用基于群体的多变量轨迹模型来确定具有可持续目标共同趋势的国家群体。提出了一种期望最大化算法,在不依赖于显式插值过程的情况下,对不完整数据进行最大似然估计。我们的结果突出了三组具有显著的强和弱可持续目标的国家。这三个群体的共同目标包括提高生产力,增加农村地区的个人收入,减少农村地区的贫困,增加可再生能源的生产,兴起有机农业和减少氮平衡。相反,提高管理人员的更替和减少温室气体排放是所有三组国家共同的薄弱目标。我们的研究结果为制定共同农业政策(CAP)内补贴归属的新方案提供了宝贵的资源。
{"title":"Assessment of agricultural sustainability in European Union countries: a group-based multivariate trajectory approach","authors":"Alessandro Magrini","doi":"10.1007/s10182-022-00437-9","DOIUrl":"10.1007/s10182-022-00437-9","url":null,"abstract":"<div><p>Sustainability of agriculture is difficult to measure and assess because it is a multidimensional concept that involves economic, social and environmental aspects and is subjected to temporal evolution and geographical differences. Existing studies assessing agricultural sustainability in the European Union (EU) are affected by several shortcomings that limit their relevance for policy makers. Specifically, most of them focus on farm level or cover a small set of countries, and the few exceptions covering a broad set of countries consider only a subset of the sustainable dimensions or rely on cross-sectional data. In this paper, we consider yearly data on 12 indicators (5 for the economic, 3 for the social and 4 for the environmental dimension) measured on 26 EU countries in the period 2004–2018 (15 years), and apply group-based multivariate trajectory modeling to identify groups of countries with common trends of sustainable objectives. An expectation-maximization algorithm is proposed to perform maximum likelihood estimation from incomplete data without relying on an explicit imputation procedure. Our results highlight three groups of countries with distinguished strong and weak sustainable objectives. Strong objectives common to all the three groups include improvement of productivity, increase of personal income in rural areas, reduction of poverty in rural areas, increase of production of renewable energy, rise of organic farming and reduction of nitrogen balance. Instead, enhancement of manager turnover and reduction of greenhouse gas emissions are weak objectives common to all the three groups of countries. Our findings represent a valuable resource to formulate new schemes for the attribution of subsidies within the Common Agricultural Policy (CAP).</p></div>","PeriodicalId":55446,"journal":{"name":"Asta-Advances in Statistical Analysis","volume":"106 4","pages":"673 - 703"},"PeriodicalIF":1.4,"publicationDate":"2022-03-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10182-022-00437-9.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"50009738","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
Correction to: Assessment of agricultural sustainability in European Union countries: a group-based multivariate trajectory approach 修正:欧盟国家农业可持续性评估:基于群体的多元轨迹方法
IF 1.4 4区 数学 Q2 STATISTICS & PROBABILITY Pub Date : 2022-03-05 DOI: 10.1007/s10182-022-00437-9
Alessandro Magrini
{"title":"Correction to: Assessment of agricultural sustainability in European Union countries: a group-based multivariate trajectory approach","authors":"Alessandro Magrini","doi":"10.1007/s10182-022-00437-9","DOIUrl":"https://doi.org/10.1007/s10182-022-00437-9","url":null,"abstract":"","PeriodicalId":55446,"journal":{"name":"Asta-Advances in Statistical Analysis","volume":"106 1","pages":"525-526"},"PeriodicalIF":1.4,"publicationDate":"2022-03-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47042311","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
Action rate models for predicting actions in soccer 预测足球动作的动作率模型
IF 1.4 4区 数学 Q2 STATISTICS & PROBABILITY Pub Date : 2022-03-02 DOI: 10.1007/s10182-022-00435-x
Uwe Dick, Ulf Brefeld

We present a data-driven approach to predict the next action in soccer. We focus on passing actions of the ball possessing player and aim to forecast the pass itself and when, in time, the pass will be played. At the same time, our model estimates the probability that the player loses possession of the ball before she can perform the action. Our approach consists of parameterized exponential rate models for all possible actions that are adapted to historic data with graph recurrent neural networks to account for inter-dependencies of the output space (i.e., the possible actions). We report on empirical results.

我们提出了一种数据驱动的方法来预测足球比赛的下一步动作。我们关注有球球员的传球动作,目的是预测传球本身以及何时、何时传球。同时,我们的模型估计了球员在执行动作之前失去球权的概率。我们的方法包括参数化指数率模型,用于所有可能的动作,这些动作适用于使用图递归神经网络的历史数据,以解释输出空间的相互依赖性(即可能的动作)。我们报告实证结果。
{"title":"Action rate models for predicting actions in soccer","authors":"Uwe Dick,&nbsp;Ulf Brefeld","doi":"10.1007/s10182-022-00435-x","DOIUrl":"10.1007/s10182-022-00435-x","url":null,"abstract":"<div><p>We present a data-driven approach to predict the next action in soccer. We focus on passing actions of the ball possessing player and aim to forecast the pass itself and when, in time, the pass will be played. At the same time, our model estimates the probability that the player loses possession of the ball before she can perform the action. Our approach consists of parameterized exponential rate models for all possible actions that are adapted to historic data with graph recurrent neural networks to account for inter-dependencies of the output space (i.e., the possible actions). We report on empirical results.</p></div>","PeriodicalId":55446,"journal":{"name":"Asta-Advances in Statistical Analysis","volume":"107 1-2","pages":"29 - 49"},"PeriodicalIF":1.4,"publicationDate":"2022-03-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10182-022-00435-x.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46045869","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Scoring predictions at extreme quantiles 以极端分位数对预测进行评分
IF 1.4 4区 数学 Q2 STATISTICS & PROBABILITY Pub Date : 2022-02-14 DOI: 10.1007/s10182-021-00421-9
Axel Gandy, Kaushik Jana, Almut E. D. Veraart

Prediction of quantiles at extreme tails is of interest in numerous applications. Extreme value modelling provides various competing predictors for this point prediction problem. A common method of assessment of a set of competing predictors is to evaluate their predictive performance in a given situation. However, due to the extreme nature of this inference problem, it can be possible that the predicted quantiles are not seen in the historical records, particularly when the sample size is small. This situation poses a problem to the validation of the prediction with its realization. In this article, we propose two non-parametric scoring approaches to assess extreme quantile prediction mechanisms. The proposed assessment methods are based on predicting a sequence of equally extreme quantiles on different parts of the data. We then use the quantile scoring function to evaluate the competing predictors. The performance of the scoring methods is compared with the conventional scoring method and the superiority of the former methods are demonstrated in a simulation study. The methods are then applied to analyze cyber Netflow data from Los Alamos National Laboratory and daily precipitation data at a station in California available from Global Historical Climatology Network.

预测极端尾部的分位数在许多应用中都很有趣。极值模型为这一点预测问题提供了多种相互竞争的预测方法。评估一组相互竞争的预测器的常用方法是评估它们在给定情况下的预测性能。然而,由于这个推理问题的极端性质,有可能在历史记录中看不到预测的分位数,特别是在样本量很小的情况下。这种情况给预测的验证和实现带来了问题。在本文中,我们提出了两种非参数评分方法来评估极端分位数预测机制。所提出的评估方法是基于对数据不同部分的相同极端分位数序列的预测。然后,我们使用分位数评分函数来评估竞争预测因子。通过仿真研究,将该评分方法与传统评分方法进行了性能比较,并验证了其优越性。然后将这些方法应用于分析来自洛斯阿拉莫斯国家实验室的网络Netflow数据和全球历史气候学网络提供的加利福尼亚站的日降水数据。
{"title":"Scoring predictions at extreme quantiles","authors":"Axel Gandy,&nbsp;Kaushik Jana,&nbsp;Almut E. D. Veraart","doi":"10.1007/s10182-021-00421-9","DOIUrl":"10.1007/s10182-021-00421-9","url":null,"abstract":"<div><p>Prediction of quantiles at extreme tails is of interest in numerous applications. Extreme value modelling provides various competing predictors for this point prediction problem. A common method of assessment of a set of competing predictors is to evaluate their predictive performance in a given situation. However, due to the extreme nature of this inference problem, it can be possible that the predicted quantiles are not seen in the historical records, particularly when the sample size is small. This situation poses a problem to the validation of the prediction with its realization. In this article, we propose two non-parametric scoring approaches to assess extreme quantile prediction mechanisms. The proposed assessment methods are based on predicting a sequence of equally extreme quantiles on different parts of the data. We then use the quantile scoring function to evaluate the competing predictors. The performance of the scoring methods is compared with the conventional scoring method and the superiority of the former methods are demonstrated in a simulation study. The methods are then applied to analyze cyber Netflow data from Los Alamos National Laboratory and daily precipitation data at a station in California available from Global Historical Climatology Network.</p></div>","PeriodicalId":55446,"journal":{"name":"Asta-Advances in Statistical Analysis","volume":"106 4","pages":"527 - 544"},"PeriodicalIF":1.4,"publicationDate":"2022-02-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46149253","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Flexible models for non-equidispersed count data: comparative performance of parametric models to deal with underdispersion 非等分散计数数据的灵活模型:处理欠分散的参数模型的比较性能
IF 1.4 4区 数学 Q2 STATISTICS & PROBABILITY Pub Date : 2022-02-03 DOI: 10.1007/s10182-021-00432-6
Douglas Toledo, Cristiane Akemi Umetsu, Antonio Fernando Monteiro Camargo, Idemauro Antonio Rodrigues de Lara

Count data as response variables are commonly modeled using Poisson regression models, which require equidispersion, i.e., equal mean and variance. However, this relationship does not always occur, and the variance may be higher or lower than the mean, phenomena are known as overdispersion and underdispersion, respectively. Non-equidispersion, when disregarded, can lead to a number of misinterpretations and inadequate predictions. Here, we compare the use of the COM-Poisson, double Poisson, Gamma-count, and restricted generalized Poisson models as a more flexible class for count problems associated with over- and underdispersion, since they have an additional parameter that allows more flexible analysis. The proposed method is useful in different applications, but here we provide an example using an underdispersed dataset concerning ecological invasion. For validation of the models, we use half-normal plots. The COM-Poisson, double Poisson, and Gamma-count performed best and properly modeled the underdispersion. The use of correct statistical models is recommended to handle this data property using objective criteria to ensure accurate statistical inferences.

作为响应变量的计数数据通常使用泊松回归模型建模,泊松回归模型要求等离散性,即均值和方差相等。然而,这种关系并不总是发生,方差可能高于或低于平均值,这种现象分别被称为过分散和欠分散。如果忽视非等分散,可能导致许多误解和不充分的预测。在这里,我们比较了com -泊松、双泊松、伽玛计数和受限广义泊松模型的使用,作为与过色散和欠色散相关的计数问题的更灵活的一类,因为它们有一个额外的参数,允许更灵活的分析。所提出的方法在不同的应用中都是有用的,但在这里我们提供了一个关于生态入侵的欠分散数据集的例子。为了验证模型,我们使用半正态图。com -泊松、双泊松和γ -计数表现最好,能很好地模拟欠色散。建议使用正确的统计模型来使用客观标准处理此数据属性,以确保准确的统计推断。
{"title":"Flexible models for non-equidispersed count data: comparative performance of parametric models to deal with underdispersion","authors":"Douglas Toledo,&nbsp;Cristiane Akemi Umetsu,&nbsp;Antonio Fernando Monteiro Camargo,&nbsp;Idemauro Antonio Rodrigues de Lara","doi":"10.1007/s10182-021-00432-6","DOIUrl":"10.1007/s10182-021-00432-6","url":null,"abstract":"<div><p>Count data as response variables are commonly modeled using Poisson regression models, which require equidispersion, i.e., equal mean and variance. However, this relationship does not always occur, and the variance may be higher or lower than the mean, phenomena are known as overdispersion and underdispersion, respectively. Non-equidispersion, when disregarded, can lead to a number of misinterpretations and inadequate predictions. Here, we compare the use of the COM-Poisson, double Poisson, Gamma-count, and restricted generalized Poisson models as a more flexible class for count problems associated with over- and underdispersion, since they have an additional parameter that allows more flexible analysis. The proposed method is useful in different applications, but here we provide an example using an underdispersed dataset concerning ecological invasion. For validation of the models, we use half-normal plots. The COM-Poisson, double Poisson, and Gamma-count performed best and properly modeled the underdispersion. The use of correct statistical models is recommended to handle this data property using objective criteria to ensure accurate statistical inferences.</p></div>","PeriodicalId":55446,"journal":{"name":"Asta-Advances in Statistical Analysis","volume":"106 3","pages":"473 - 497"},"PeriodicalIF":1.4,"publicationDate":"2022-02-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10182-021-00432-6.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41916220","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Ranked sparsity: a cogent regularization framework for selecting and estimating feature interactions and polynomials 排序稀疏性:用于选择和估计特征交互和多项式的令人信服的正则化框架
IF 1.4 4区 数学 Q2 STATISTICS & PROBABILITY Pub Date : 2022-01-25 DOI: 10.1007/s10182-021-00431-7
Ryan A. Peterson, Joseph E. Cavanaugh

We explore and illustrate the concept of ranked sparsity, a phenomenon that often occurs naturally in modeling applications when an expected disparity exists in the quality of information between different feature sets. Its presence can cause traditional and modern model selection methods to fail because such procedures commonly presume that each potential parameter is equally worthy of entering into the final model—we call this presumption “covariate equipoise.” However, this presumption does not always hold, especially in the presence of derived variables. For instance, when all possible interactions are considered as candidate predictors, the premise of covariate equipoise will often produce over-specified and opaque models. The sheer number of additional candidate variables grossly inflates the number of false discoveries in the interactions, resulting in unnecessarily complex and difficult-to-interpret models with many (truly spurious) interactions. We suggest a modeling strategy that requires a stronger level of evidence in order to allow certain variables (e.g., interactions) to be selected in the final model. This ranked sparsity paradigm can be implemented with the sparsity-ranked lasso (SRL). We compare the performance of SRL relative to competing methods in a series of simulation studies, showing that the SRL is a very attractive method because it is fast and accurate and produces more transparent models (with fewer false interactions). We illustrate its utility in an application to predict the survival of lung cancer patients using a set of gene expression measurements and clinical covariates, searching in particular for gene–environment interactions.

我们探索并说明了排序稀疏性的概念,当不同特征集之间的信息质量存在预期差异时,在建模应用程序中经常自然发生这种现象。它的存在会导致传统和现代模型选择方法的失败,因为这些方法通常假设每个潜在参数都同样值得进入最终模型-我们称之为“协变量均衡”。然而,这种假设并不总是成立,特别是在存在衍生变量的情况下。例如,当所有可能的相互作用被视为候选预测因子时,协变量均衡的前提通常会产生过度指定和不透明的模型。额外候选变量的绝对数量大大增加了相互作用中错误发现的数量,导致具有许多(真正虚假的)相互作用的不必要的复杂和难以解释的模型。我们建议一种建模策略,它需要更强的证据水平,以便允许在最终模型中选择某些变量(例如,相互作用)。这种分级稀疏性范例可以用稀疏度分级套索(SRL)来实现。我们在一系列仿真研究中比较了SRL相对于竞争方法的性能,表明SRL是一种非常有吸引力的方法,因为它快速和准确,并且产生更透明的模型(具有更少的错误交互)。我们说明了它在预测肺癌患者生存的应用中的效用,使用一组基因表达测量和临床协变量,特别是搜索基因与环境的相互作用。
{"title":"Ranked sparsity: a cogent regularization framework for selecting and estimating feature interactions and polynomials","authors":"Ryan A. Peterson,&nbsp;Joseph E. Cavanaugh","doi":"10.1007/s10182-021-00431-7","DOIUrl":"10.1007/s10182-021-00431-7","url":null,"abstract":"<div><p>We explore and illustrate the concept of ranked sparsity, a phenomenon that often occurs naturally in modeling applications when an expected disparity exists in the quality of information between different feature sets. Its presence can cause traditional and modern model selection methods to fail because such procedures commonly presume that each potential parameter is equally worthy of entering into the final model—we call this presumption “covariate equipoise.” However, this presumption does not always hold, especially in the presence of derived variables. For instance, when all possible interactions are considered as candidate predictors, the premise of covariate equipoise will often produce over-specified and opaque models. The sheer number of additional candidate variables grossly inflates the number of false discoveries in the interactions, resulting in unnecessarily complex and difficult-to-interpret models with many (truly spurious) interactions. We suggest a modeling strategy that requires a stronger level of evidence in order to allow certain variables (e.g., interactions) to be selected in the final model. This ranked sparsity paradigm can be implemented with the sparsity-ranked lasso (SRL). We compare the performance of SRL relative to competing methods in a series of simulation studies, showing that the SRL is a very attractive method because it is fast and accurate and produces more transparent models (with fewer false interactions). We illustrate its utility in an application to predict the survival of lung cancer patients using a set of gene expression measurements and clinical covariates, searching in particular for gene–environment interactions.</p></div>","PeriodicalId":55446,"journal":{"name":"Asta-Advances in Statistical Analysis","volume":"106 3","pages":"427 - 454"},"PeriodicalIF":1.4,"publicationDate":"2022-01-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10182-021-00431-7.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49651637","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
期刊
Asta-Advances in Statistical Analysis
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1