首页 > 最新文献

The New England Journal of Statistics in Data Science最新文献

英文 中文
Some Noteworthy Issues in Joint Species Distribution Modeling for Plant Data 植物数据联合物种分布建模中几个值得注意的问题
Pub Date : 2022-01-01 DOI: 10.51387/22-nejsds11
A. Gelfand
Joint species distribution modeling is attracting increasing attention in the literature these days, recognizing the fact that single species modeling fails to take into account expected dependence/interaction between species. This short paper offers discussion that attempts to illuminate five noteworthy technical issues associated with such modeling in the context of plant data. In this setting, the joint species distribution work in the literature considers several types of species data collection. For convenience of discussion, we focus on joint modeling of presence/absence data. For such data, the primary modeling strategy has been through introduction of latent multivariate normal random variables. These issues address the following: (i) how the observed presence/absence data is linked to the latent normal variables as well as the resulting implications with regard to modeling the data sites as independent or spatially dependent, (ii) the incompatibility of point referenced and areal referenced presence/absence data in spatial modeling of species distribution, (iii) the effect of modeling species independently/marginally rather than jointly within site, with regard to assessing species distribution, (iv) the interpretation of species dependence under the use of latent multivariate normal specification, and (v) the interpretation of clustering of species associated with specific joint species distribution modeling specifications. It is hoped that, by attempting to clarify these issues, ecological modelers and quantitative ecologists will be able to better appreciate some subtleties that are implicit in this growing collection of modeling ideas. In this regard, this paper can serve as a useful companion piece to the recent survey/comparison article by [33] in Methods in Ecology and Evolution.
由于认识到单一物种模型不能考虑物种间预期的依赖/相互作用,联合物种分布模型越来越受到文献的关注。这篇短文提供了讨论,试图阐明在工厂数据的背景下与这种建模相关的五个值得注意的技术问题。在这种情况下,文献中的联合物种分布工作考虑了几种类型的物种数据收集。为了便于讨论,我们将重点放在在场/缺席数据的联合建模上。对于这类数据,主要的建模策略是通过引入潜在的多变量正态随机变量。这些问题涉及以下内容:(i)观测到的存在/缺失数据如何与潜在正态变量联系起来,以及将数据点建模为独立或空间依赖的结果,(ii)在物种分布的空间建模中,点参考和面参考存在/缺失数据的不兼容性,(iii)在评估物种分布方面,独立/边缘建模物种而不是在站点内联合建模物种的影响,(iv)使用潜在多变量正态规范解释物种依赖,以及(v)解释与特定联合物种分布建模规范相关的物种聚类。希望通过尝试澄清这些问题,生态建模者和定量生态学家将能够更好地理解这些不断增长的建模思想集合中隐含的一些微妙之处。在这方面,这篇论文可以作为b[33]最近发表在《生态学与进化方法》(Methods In Ecology and Evolution)上的调查/比较文章的有益补充。
{"title":"Some Noteworthy Issues in Joint Species Distribution Modeling for Plant Data","authors":"A. Gelfand","doi":"10.51387/22-nejsds11","DOIUrl":"https://doi.org/10.51387/22-nejsds11","url":null,"abstract":"Joint species distribution modeling is attracting increasing attention in the literature these days, recognizing the fact that single species modeling fails to take into account expected dependence/interaction between species. This short paper offers discussion that attempts to illuminate five noteworthy technical issues associated with such modeling in the context of plant data. In this setting, the joint species distribution work in the literature considers several types of species data collection. For convenience of discussion, we focus on joint modeling of presence/absence data. For such data, the primary modeling strategy has been through introduction of latent multivariate normal random variables. These issues address the following: (i) how the observed presence/absence data is linked to the latent normal variables as well as the resulting implications with regard to modeling the data sites as independent or spatially dependent, (ii) the incompatibility of point referenced and areal referenced presence/absence data in spatial modeling of species distribution, (iii) the effect of modeling species independently/marginally rather than jointly within site, with regard to assessing species distribution, (iv) the interpretation of species dependence under the use of latent multivariate normal specification, and (v) the interpretation of clustering of species associated with specific joint species distribution modeling specifications. It is hoped that, by attempting to clarify these issues, ecological modelers and quantitative ecologists will be able to better appreciate some subtleties that are implicit in this growing collection of modeling ideas. In this regard, this paper can serve as a useful companion piece to the recent survey/comparison article by [33] in Methods in Ecology and Evolution.","PeriodicalId":94360,"journal":{"name":"The New England Journal of Statistics in Data Science","volume":"60 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83766303","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Dietary Patterns and Cancer Risk: An Overview with Focus on Methods 饮食模式与癌症风险:以方法为重点的综述
Pub Date : 2022-01-01 DOI: 10.51387/23-nejsds35
V. Edefonti, R. De Vito, M. Parpinel, M. Ferraroni
Traditionally, research in nutritional epidemiology has focused on specific foods/food groups or single nutrients in their relation with disease outcomes, including cancer. Dietary pattern analysis have been introduced to examine potential cumulative and interactive effects of individual dietary components of the overall diet, in which foods are consumed in combination. Dietary patterns can be identified by using evidence-based investigator-defined approaches or by using data-driven approaches, which rely on either response independent (also named “a posteriori” dietary patterns) or response dependent (also named “mixed-type” dietary patterns) multivariate statistical methods. Within the open methodological challenges related to study design, dietary assessment, identification of dietary patterns, confounding phenomena, and cancer risk assessment, the current paper provides an updated landscape review of novel methodological developments in the statistical analysis of a posteriori/mixed-type dietary patterns and cancer risk. The review starts from standard a posteriori dietary patterns from principal component, factor, and cluster analyses, including mixture models, and examines mixed-type dietary patterns from reduced rank regression, partial least squares, classification and regression tree analysis, and least absolute shrinkage and selection operator. Novel statistical approaches reviewed include Bayesian factor analysis with modeling of sparsity through shrinkage and sparse priors and frequentist focused principal component analysis. Most novelties relate to the reproducibility of dietary patterns across studies where potentialities of the Bayesian approach to factor and cluster analysis work at best.
传统上,营养流行病学的研究侧重于特定食物/食物组或单一营养素与疾病结局(包括癌症)的关系。饮食模式分析已被引入,以检查整体饮食中单个饮食成分的潜在累积和相互作用效应,其中食物被组合食用。饮食模式可以通过使用基于证据的研究者定义的方法或使用数据驱动的方法来确定,这些方法依赖于反应独立(也称为“后验”饮食模式)或反应依赖(也称为“混合型”饮食模式)的多变量统计方法。在与研究设计、饮食评估、饮食模式识别、混杂现象和癌症风险评估相关的开放式方法学挑战中,本文对后验/混合型饮食模式和癌症风险统计分析方面的新方法学发展进行了最新的综述。本综述从包括混合模型在内的主成分、因子和聚类分析的标准后验饮食模式开始,并从降秩回归、偏最小二乘法、分类和回归树分析、最小绝对收缩和选择算子等方法检验混合饮食模式。回顾了新的统计方法,包括贝叶斯因子分析,通过收缩和稀疏先验来建模稀疏性,以及频率集中的主成分分析。大多数新奇之处都与饮食模式的可重复性有关,在这些研究中,贝叶斯方法的潜力在因子和聚类分析中发挥了最大的作用。
{"title":"Dietary Patterns and Cancer Risk: An Overview with Focus on Methods","authors":"V. Edefonti, R. De Vito, M. Parpinel, M. Ferraroni","doi":"10.51387/23-nejsds35","DOIUrl":"https://doi.org/10.51387/23-nejsds35","url":null,"abstract":"Traditionally, research in nutritional epidemiology has focused on specific foods/food groups or single nutrients in their relation with disease outcomes, including cancer. Dietary pattern analysis have been introduced to examine potential cumulative and interactive effects of individual dietary components of the overall diet, in which foods are consumed in combination. Dietary patterns can be identified by using evidence-based investigator-defined approaches or by using data-driven approaches, which rely on either response independent (also named “a posteriori” dietary patterns) or response dependent (also named “mixed-type” dietary patterns) multivariate statistical methods. Within the open methodological challenges related to study design, dietary assessment, identification of dietary patterns, confounding phenomena, and cancer risk assessment, the current paper provides an updated landscape review of novel methodological developments in the statistical analysis of a posteriori/mixed-type dietary patterns and cancer risk. The review starts from standard a posteriori dietary patterns from principal component, factor, and cluster analyses, including mixture models, and examines mixed-type dietary patterns from reduced rank regression, partial least squares, classification and regression tree analysis, and least absolute shrinkage and selection operator. Novel statistical approaches reviewed include Bayesian factor analysis with modeling of sparsity through shrinkage and sparse priors and frequentist focused principal component analysis. Most novelties relate to the reproducibility of dietary patterns across studies where potentialities of the Bayesian approach to factor and cluster analysis work at best.","PeriodicalId":94360,"journal":{"name":"The New England Journal of Statistics in Data Science","volume":"25 3 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89607622","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
An Optimal Two-Period Multiarm Platform Design with New Experimental Arms Added During the Trial 试验中加入新实验臂的两周期多臂平台优化设计
Pub Date : 2022-01-01 DOI: 10.51387/22-nejsds15
H. Pan, Xiaomeng Yuan, Jingjing Ye
Platform trials are multiarm clinical studies that allow the addition of new experimental arms after the activation of the trial. Statistical issues concerning “adding new arms”, however, have not been thoroughly discussed. This work was motivated by a “two-period” pediatric osteosarcoma study, starting with two experimental arms and one control arm and later adding two more pre-planned experimental arms. The common control arm will be shared among experimental arms across the trial. In this paper, we provide a principled approach, including how to modify the critical boundaries to control the family-wise error rate as new arms are added, how to re-estimate the sample sizes and provide the optimal control-to-experimental arms allocation ratio, in terms of minimizing the total sample size to achieve a desirable marginal power level. We examined the influence of the timing of adding new arms on the design’s operating characteristics, which provides a practical guide for deciding the timing. Other various numerical evaluations have also been conducted. A method for controlling the pair-wise error rate (PWER) has also been developed. We have published an R package, PlatformDesign, on CRAN for practitioners to easily implement this platform trial approach. A detailed step-by-step tutorial is provided in Appendix A.2.
平台试验是多臂临床研究,允许在试验启动后增加新的实验臂。但是,关于“增加新武器”的统计问题没有得到彻底讨论。这项工作的动机是一项“两期”的儿童骨肉瘤研究,从两个实验臂和一个对照臂开始,然后再增加两个预先计划的实验臂。在整个试验过程中,共同的控制臂将在实验臂之间共享。在本文中,我们提供了一种原则性的方法,包括如何修改临界边界以控制新臂增加时的家庭误差率,如何重新估计样本量并提供最优的对照-实验臂分配比例,以最小化总样本量以达到理想的边际功率水平。研究了增臂时机对设计工作特性的影响,为确定增臂时机提供了实用的指导。还进行了其他各种数值评价。本文还提出了一种控制成对错误率(power)的方法。我们已经在CRAN上发布了一个R包,PlatformDesign,供从业者轻松实现这个平台试验方法。附录A.2提供了详细的分步教程。
{"title":"An Optimal Two-Period Multiarm Platform Design with New Experimental Arms Added During the Trial","authors":"H. Pan, Xiaomeng Yuan, Jingjing Ye","doi":"10.51387/22-nejsds15","DOIUrl":"https://doi.org/10.51387/22-nejsds15","url":null,"abstract":"Platform trials are multiarm clinical studies that allow the addition of new experimental arms after the activation of the trial. Statistical issues concerning “adding new arms”, however, have not been thoroughly discussed. This work was motivated by a “two-period” pediatric osteosarcoma study, starting with two experimental arms and one control arm and later adding two more pre-planned experimental arms. The common control arm will be shared among experimental arms across the trial. In this paper, we provide a principled approach, including how to modify the critical boundaries to control the family-wise error rate as new arms are added, how to re-estimate the sample sizes and provide the optimal control-to-experimental arms allocation ratio, in terms of minimizing the total sample size to achieve a desirable marginal power level. We examined the influence of the timing of adding new arms on the design’s operating characteristics, which provides a practical guide for deciding the timing. Other various numerical evaluations have also been conducted. A method for controlling the pair-wise error rate (PWER) has also been developed. We have published an R package, PlatformDesign, on CRAN for practitioners to easily implement this platform trial approach. A detailed step-by-step tutorial is provided in Appendix A.2.","PeriodicalId":94360,"journal":{"name":"The New England Journal of Statistics in Data Science","volume":"94 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83899326","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Effect of Model Space Priors on Statistical Inference with Model Uncertainty 模型空间先验对模型不确定性统计推断的影响
Pub Date : 2022-01-01 DOI: 10.51387/22-nejsds14
Anupreet Porwal, A. Raftery
Bayesian model averaging (BMA) provides a coherent way to account for model uncertainty in statistical inference tasks. BMA requires specification of model space priors and parameter space priors. In this article we focus on comparing different model space priors in the presence of model uncertainty. We consider eight reference model space priors used in the literature and three adaptive parameter priors recommended by Porwal and Raftery [37]. We assess the performance of these combinations of prior specifications for variable selection in linear regression models for the statistical tasks of parameter estimation, interval estimation, inference, point and interval prediction. We carry out an extensive simulation study based on 14 real datasets representing a range of situations encountered in practice. We found that beta-binomial model space priors specified in terms of the prior probability of model size performed best on average across various statistical tasks and datasets, outperforming priors that were uniform across models. Recently proposed complexity priors performed relatively poorly.
贝叶斯模型平均(BMA)为统计推理任务中的模型不确定性提供了一种连贯的方法。BMA要求规范模型空间先验和参数空间先验。在本文中,我们重点比较了模型不确定性存在下不同的模型空间先验。我们考虑了文献中使用的8个参考模型空间先验和Porwal和Raftery推荐的3个自适应参数先验。我们评估了线性回归模型中变量选择的这些先验规范组合的性能,用于参数估计、区间估计、推理、点和区间预测等统计任务。我们基于14个真实数据集进行了广泛的模拟研究,这些数据集代表了实践中遇到的一系列情况。我们发现,根据模型大小的先验概率指定的β -二项模型空间先验在各种统计任务和数据集中平均表现最好,优于在模型中均匀的先验。最近提出的复杂性先验表现相对较差。
{"title":"Effect of Model Space Priors on Statistical Inference with Model Uncertainty","authors":"Anupreet Porwal, A. Raftery","doi":"10.51387/22-nejsds14","DOIUrl":"https://doi.org/10.51387/22-nejsds14","url":null,"abstract":"Bayesian model averaging (BMA) provides a coherent way to account for model uncertainty in statistical inference tasks. BMA requires specification of model space priors and parameter space priors. In this article we focus on comparing different model space priors in the presence of model uncertainty. We consider eight reference model space priors used in the literature and three adaptive parameter priors recommended by Porwal and Raftery [37]. We assess the performance of these combinations of prior specifications for variable selection in linear regression models for the statistical tasks of parameter estimation, interval estimation, inference, point and interval prediction. We carry out an extensive simulation study based on 14 real datasets representing a range of situations encountered in practice. We found that beta-binomial model space priors specified in terms of the prior probability of model size performed best on average across various statistical tasks and datasets, outperforming priors that were uniform across models. Recently proposed complexity priors performed relatively poorly.","PeriodicalId":94360,"journal":{"name":"The New England Journal of Statistics in Data Science","volume":"62 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79901780","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Frequentism Frequentism
Pub Date : 2022-01-01 DOI: 10.51387/22-nejsds4a
A. van der Vaart
Discussion of “Four types of frequentism and their interplay with Bayesianism” by Jim Berger.
讨论Jim Berger的“四种类型的频率主义及其与贝叶斯主义的相互作用”。
{"title":"Frequentism","authors":"A. van der Vaart","doi":"10.51387/22-nejsds4a","DOIUrl":"https://doi.org/10.51387/22-nejsds4a","url":null,"abstract":"Discussion of “Four types of frequentism and their interplay with Bayesianism” by Jim Berger.","PeriodicalId":94360,"journal":{"name":"The New England Journal of Statistics in Data Science","volume":"11 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85232851","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
On Bayesian Sequential Clinical Trial Designs 关于贝叶斯顺序临床试验设计
Pub Date : 2021-12-17 DOI: 10.51387/23-NEJSDS24
Tianjian Zhou, Yuan Ji
Clinical trials usually involve sequential patient entry. When designing a clinical trial, it is often desirable to include a provision for interim analyses of accumulating data with the potential for stopping the trial early. We review Bayesian sequential clinical trial designs based on posterior probabilities, posterior predictive probabilities, and decision-theoretic frameworks. A pertinent question is whether Bayesian sequential designs need to be adjusted for the planning of interim analyses. We answer this question from three perspectives: a frequentist-oriented perspective, a calibrated Bayesian perspective, and a subjective Bayesian perspective. We also provide new insights into the likelihood principle, which is commonly tied to statistical inference and decision making in sequential clinical trials. Some theoretical results are derived, and numerical studies are conducted to illustrate and assess these designs.
临床试验通常涉及顺序患者入组。在设计临床试验时,通常希望包括对积累的数据进行中期分析的规定,这些数据有可能提前停止试验。我们回顾了基于后验概率、后验预测概率和决策理论框架的贝叶斯顺序临床试验设计。一个相关的问题是贝叶斯序列设计是否需要为中期分析的规划进行调整。我们从三个角度回答这个问题:频率导向的角度,校准贝叶斯的角度,和主观贝叶斯的角度。我们还提供了新的见解的可能性原则,这是通常与统计推断和决策制定顺序临床试验。推导了一些理论结果,并进行了数值研究来说明和评估这些设计。
{"title":"On Bayesian Sequential Clinical Trial Designs","authors":"Tianjian Zhou, Yuan Ji","doi":"10.51387/23-NEJSDS24","DOIUrl":"https://doi.org/10.51387/23-NEJSDS24","url":null,"abstract":"Clinical trials usually involve sequential patient entry. When designing a clinical trial, it is often desirable to include a provision for interim analyses of accumulating data with the potential for stopping the trial early. We review Bayesian sequential clinical trial designs based on posterior probabilities, posterior predictive probabilities, and decision-theoretic frameworks. A pertinent question is whether Bayesian sequential designs need to be adjusted for the planning of interim analyses. We answer this question from three perspectives: a frequentist-oriented perspective, a calibrated Bayesian perspective, and a subjective Bayesian perspective. We also provide new insights into the likelihood principle, which is commonly tied to statistical inference and decision making in sequential clinical trials. Some theoretical results are derived, and numerical studies are conducted to illustrate and assess these designs.","PeriodicalId":94360,"journal":{"name":"The New England Journal of Statistics in Data Science","volume":"28 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73067114","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Poisson Modeling and Predicting English Premier League Goal Scoring 泊松模型与预测英超进球
Pub Date : 2021-05-20 DOI: 10.51387/21-NEJSDS1
Quang-Nguyen Nguyen
The English Premier League is well-known for being not only one of the most popular professional sports leagues in the world, but also one of the toughest competitions to predict. The first purpose of this research was to verify the consistency between goal scoring in the English Premier League and the Poisson process; specifically, the relationships between the number of goals scored in a match and the Poisson distribution, the time between goals throughout the course of a season and the exponential distribution, and the time location of goals during football games and the continuous uniform distribution. We found that the Poisson process and the three probability distributions accurately describe Premier League goal scoring. In addition, Poisson regression was utilized to predict outcomes for a Premier League season, using different sets of season data and with a large number of simulations being involved. We examined and compared various soccer metrics from our simulation results, including an English club’s chances of being the champions, finishing in the top four and bottom three, and relegation points.
众所周知,英超联赛不仅是世界上最受欢迎的职业体育联赛之一,也是最难预测的比赛之一。本研究的第一个目的是验证英超进球与泊松过程之间的一致性;具体来说,一场比赛的进球数与泊松分布的关系,一个赛季的进球间隔时间与指数分布的关系,以及足球比赛中进球的时间位置与连续均匀分布的关系。我们发现泊松过程和三种概率分布准确地描述了英超联赛的进球情况。此外,使用不同的赛季数据集和大量的模拟,利用泊松回归来预测英超联赛的结果。我们检查并比较了模拟结果中的各种足球指标,包括英国俱乐部成为冠军的机会、排名前四和倒数三名以及降级积分。
{"title":"Poisson Modeling and Predicting English Premier League Goal Scoring","authors":"Quang-Nguyen Nguyen","doi":"10.51387/21-NEJSDS1","DOIUrl":"https://doi.org/10.51387/21-NEJSDS1","url":null,"abstract":"The English Premier League is well-known for being not only one of the most popular professional sports leagues in the world, but also one of the toughest competitions to predict. The first purpose of this research was to verify the consistency between goal scoring in the English Premier League and the Poisson process; specifically, the relationships between the number of goals scored in a match and the Poisson distribution, the time between goals throughout the course of a season and the exponential distribution, and the time location of goals during football games and the continuous uniform distribution. We found that the Poisson process and the three probability distributions accurately describe Premier League goal scoring. In addition, Poisson regression was utilized to predict outcomes for a Premier League season, using different sets of season data and with a large number of simulations being involved. We examined and compared various soccer metrics from our simulation results, including an English club’s chances of being the champions, finishing in the top four and bottom three, and relegation points.","PeriodicalId":94360,"journal":{"name":"The New England Journal of Statistics in Data Science","volume":"15 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84322756","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
New Perspectives on Centering 定心新视角
Pub Date : 2021-03-22 DOI: 10.51387/23-nejsds31
Jack Prothero, Jan Hannig, J. Marron
Data matrix centering is an ever-present yet under-examined aspect of data analysis. Functional data analysis (FDA) often operates with a default of centering such that the vectors in one dimension have mean zero. We find that centering along the other dimension identifies a novel useful mode of variation beyond those familiar in FDA. We explore ambiguities in both matrix orientation and nomenclature. Differences between centerings and their potential interaction can be easily misunderstood. We propose a unified framework and new terminology for centering operations. We clearly demonstrate the intuition behind and consequences of each centering choice with informative graphics. We also propose a new direction energy hypothesis test as part of a series of diagnostics for determining which choice of centering is best for a data set. We explore the application of these diagnostics in several FDA settings.
数据矩阵定心是数据分析中一个一直存在但尚未得到充分研究的方面。功能数据分析(FDA)通常使用默认的居中操作,使得一维向量的平均值为零。我们发现,沿着其他维度的中心确定了一种新的有用的变异模式,超出了FDA所熟悉的模式。我们探讨了矩阵取向和命名法的歧义。中心之间的差异及其潜在的相互作用很容易被误解。我们提出了一个统一的定心操作框架和新的术语。我们用信息图形清楚地展示了每个中心选择背后的直觉和后果。我们还提出了一个新的方向能量假设检验,作为一系列诊断的一部分,用于确定哪种定心选择最适合数据集。我们探讨这些诊断在几个FDA设置的应用。
{"title":"New Perspectives on Centering","authors":"Jack Prothero, Jan Hannig, J. Marron","doi":"10.51387/23-nejsds31","DOIUrl":"https://doi.org/10.51387/23-nejsds31","url":null,"abstract":"Data matrix centering is an ever-present yet under-examined aspect of data analysis. Functional data analysis (FDA) often operates with a default of centering such that the vectors in one dimension have mean zero. We find that centering along the other dimension identifies a novel useful mode of variation beyond those familiar in FDA. We explore ambiguities in both matrix orientation and nomenclature. Differences between centerings and their potential interaction can be easily misunderstood. We propose a unified framework and new terminology for centering operations. We clearly demonstrate the intuition behind and consequences of each centering choice with informative graphics. We also propose a new direction energy hypothesis test as part of a series of diagnostics for determining which choice of centering is best for a data set. We explore the application of these diagnostics in several FDA settings.","PeriodicalId":94360,"journal":{"name":"The New England Journal of Statistics in Data Science","volume":"23 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-03-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88240517","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
A Compartment Model of Human Mobility and Early Covid-19 Dynamics in NYC 纽约市人类流动性和Covid-19早期动态的隔室模型
Pub Date : 2021-02-03 DOI: 10.51387/21-NEJSDS2
Ian Frankenburg, Sudipto Banerjee
In this paper, we build a mechanistic system to understand the relation between a reduction in human mobility and Covid-19 spread dynamics within New York City. To this end, we propose a multivariate compartmental model that jointly models smartphone mobility data and case counts during the first 90 days of the epidemic. Parameter calibration is achieved through the formulation of a general Bayesian hierarchical model to provide uncertainty quantification of resulting estimates. The open-source probabilistic programming language Stan is used for the requisite computation. Through sensitivity analysis and out-of-sample forecasting, we find our simple and interpretable model provides evidence that reductions in human mobility altered case dynamics.
在本文中,我们建立了一个机制系统来理解纽约市人员流动减少与Covid-19传播动态之间的关系。为此,我们提出了一个多变量分区模型,该模型联合模拟了疫情前90天的智能手机移动数据和病例数。参数校准是通过制定一般贝叶斯层次模型来实现的,以提供结果估计的不确定性量化。使用开源概率编程语言Stan进行必要的计算。通过敏感性分析和样本外预测,我们发现我们的简单且可解释的模型提供了证据,证明人类流动性的减少改变了病例动态。
{"title":"A Compartment Model of Human Mobility and Early Covid-19 Dynamics in NYC","authors":"Ian Frankenburg, Sudipto Banerjee","doi":"10.51387/21-NEJSDS2","DOIUrl":"https://doi.org/10.51387/21-NEJSDS2","url":null,"abstract":"In this paper, we build a mechanistic system to understand the relation between a reduction in human mobility and Covid-19 spread dynamics within New York City. To this end, we propose a multivariate compartmental model that jointly models smartphone mobility data and case counts during the first 90 days of the epidemic. Parameter calibration is achieved through the formulation of a general Bayesian hierarchical model to provide uncertainty quantification of resulting estimates. The open-source probabilistic programming language Stan is used for the requisite computation. Through sensitivity analysis and out-of-sample forecasting, we find our simple and interpretable model provides evidence that reductions in human mobility altered case dynamics.","PeriodicalId":94360,"journal":{"name":"The New England Journal of Statistics in Data Science","volume":"29 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-02-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81425031","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Comparison Between Bayesian and Frequentist Tail Probability Estimates 贝叶斯和频率尾部概率估计的比较
Pub Date : 2019-05-09 DOI: 10.51387/23-nejsds39
Nan Shen, B. Gonz'alez, L. Pericchi
Tail probability plays an important part in the extreme value theory. Sometimes the conclusions from two approaches for estimating the tail probability of extreme events, the Bayesian and the frequentist methods, can differ a lot. In 1999, a rainfall that caused more than 30,000 deaths in Venezuela was not captured by the simple frequentist extreme value techniques. However, this catastrophic rainfall was not surprising if the Bayesian inference was used to allow for parameter uncertainty and the full available data was exploited [4].In this paper, we investigate the reasons that the Bayesian estimator of the tail probability is always higher than the frequentist estimator. Sufficient conditions for this phenomenon are established both by using Jensen’s Inequality and by looking at Taylor series approximations, both of which point to the convexity of the distribution function.
尾概率在极值理论中占有重要地位。有时,估计极端事件尾部概率的两种方法(贝叶斯方法和频率方法)得出的结论可能相差很大。1999年,委内瑞拉一场造成3万多人死亡的降雨没有被简单的频率极值技术捕捉到。然而,如果使用贝叶斯推理来考虑参数的不确定性,并充分利用可用的数据[4],那么这种灾难性降雨并不令人惊讶。本文研究了尾概率的贝叶斯估计量总是高于频率估计量的原因。这种现象的充分条件是通过詹森不等式和泰勒级数近似建立的,两者都指向分布函数的凸性。
{"title":"Comparison Between Bayesian and Frequentist Tail Probability Estimates","authors":"Nan Shen, B. Gonz'alez, L. Pericchi","doi":"10.51387/23-nejsds39","DOIUrl":"https://doi.org/10.51387/23-nejsds39","url":null,"abstract":"Tail probability plays an important part in the extreme value theory. Sometimes the conclusions from two approaches for estimating the tail probability of extreme events, the Bayesian and the frequentist methods, can differ a lot. In 1999, a rainfall that caused more than 30,000 deaths in Venezuela was not captured by the simple frequentist extreme value techniques. However, this catastrophic rainfall was not surprising if the Bayesian inference was used to allow for parameter uncertainty and the full available data was exploited [4].\u0000In this paper, we investigate the reasons that the Bayesian estimator of the tail probability is always higher than the frequentist estimator. Sufficient conditions for this phenomenon are established both by using Jensen’s Inequality and by looking at Taylor series approximations, both of which point to the convexity of the distribution function.","PeriodicalId":94360,"journal":{"name":"The New England Journal of Statistics in Data Science","volume":"17 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2019-05-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86050401","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
期刊
The New England Journal of Statistics in Data Science
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1