首页 > 最新文献

The New England Journal of Statistics in Data Science最新文献

英文 中文
Effect of Model Space Priors on Statistical Inference with Model Uncertainty 模型空间先验对模型不确定性统计推断的影响
Pub Date : 2022-01-01 DOI: 10.51387/22-nejsds14
Anupreet Porwal, A. Raftery
Bayesian model averaging (BMA) provides a coherent way to account for model uncertainty in statistical inference tasks. BMA requires specification of model space priors and parameter space priors. In this article we focus on comparing different model space priors in the presence of model uncertainty. We consider eight reference model space priors used in the literature and three adaptive parameter priors recommended by Porwal and Raftery [37]. We assess the performance of these combinations of prior specifications for variable selection in linear regression models for the statistical tasks of parameter estimation, interval estimation, inference, point and interval prediction. We carry out an extensive simulation study based on 14 real datasets representing a range of situations encountered in practice. We found that beta-binomial model space priors specified in terms of the prior probability of model size performed best on average across various statistical tasks and datasets, outperforming priors that were uniform across models. Recently proposed complexity priors performed relatively poorly.
贝叶斯模型平均(BMA)为统计推理任务中的模型不确定性提供了一种连贯的方法。BMA要求规范模型空间先验和参数空间先验。在本文中,我们重点比较了模型不确定性存在下不同的模型空间先验。我们考虑了文献中使用的8个参考模型空间先验和Porwal和Raftery推荐的3个自适应参数先验。我们评估了线性回归模型中变量选择的这些先验规范组合的性能,用于参数估计、区间估计、推理、点和区间预测等统计任务。我们基于14个真实数据集进行了广泛的模拟研究,这些数据集代表了实践中遇到的一系列情况。我们发现,根据模型大小的先验概率指定的β -二项模型空间先验在各种统计任务和数据集中平均表现最好,优于在模型中均匀的先验。最近提出的复杂性先验表现相对较差。
{"title":"Effect of Model Space Priors on Statistical Inference with Model Uncertainty","authors":"Anupreet Porwal, A. Raftery","doi":"10.51387/22-nejsds14","DOIUrl":"https://doi.org/10.51387/22-nejsds14","url":null,"abstract":"Bayesian model averaging (BMA) provides a coherent way to account for model uncertainty in statistical inference tasks. BMA requires specification of model space priors and parameter space priors. In this article we focus on comparing different model space priors in the presence of model uncertainty. We consider eight reference model space priors used in the literature and three adaptive parameter priors recommended by Porwal and Raftery [37]. We assess the performance of these combinations of prior specifications for variable selection in linear regression models for the statistical tasks of parameter estimation, interval estimation, inference, point and interval prediction. We carry out an extensive simulation study based on 14 real datasets representing a range of situations encountered in practice. We found that beta-binomial model space priors specified in terms of the prior probability of model size performed best on average across various statistical tasks and datasets, outperforming priors that were uniform across models. Recently proposed complexity priors performed relatively poorly.","PeriodicalId":94360,"journal":{"name":"The New England Journal of Statistics in Data Science","volume":"62 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79901780","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Frequentism Frequentism
Pub Date : 2022-01-01 DOI: 10.51387/22-nejsds4a
A. van der Vaart
Discussion of “Four types of frequentism and their interplay with Bayesianism” by Jim Berger.
讨论Jim Berger的“四种类型的频率主义及其与贝叶斯主义的相互作用”。
{"title":"Frequentism","authors":"A. van der Vaart","doi":"10.51387/22-nejsds4a","DOIUrl":"https://doi.org/10.51387/22-nejsds4a","url":null,"abstract":"Discussion of “Four types of frequentism and their interplay with Bayesianism” by Jim Berger.","PeriodicalId":94360,"journal":{"name":"The New England Journal of Statistics in Data Science","volume":"11 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85232851","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
On Bayesian Sequential Clinical Trial Designs 关于贝叶斯顺序临床试验设计
Pub Date : 2021-12-17 DOI: 10.51387/23-NEJSDS24
Tianjian Zhou, Yuan Ji
Clinical trials usually involve sequential patient entry. When designing a clinical trial, it is often desirable to include a provision for interim analyses of accumulating data with the potential for stopping the trial early. We review Bayesian sequential clinical trial designs based on posterior probabilities, posterior predictive probabilities, and decision-theoretic frameworks. A pertinent question is whether Bayesian sequential designs need to be adjusted for the planning of interim analyses. We answer this question from three perspectives: a frequentist-oriented perspective, a calibrated Bayesian perspective, and a subjective Bayesian perspective. We also provide new insights into the likelihood principle, which is commonly tied to statistical inference and decision making in sequential clinical trials. Some theoretical results are derived, and numerical studies are conducted to illustrate and assess these designs.
临床试验通常涉及顺序患者入组。在设计临床试验时,通常希望包括对积累的数据进行中期分析的规定,这些数据有可能提前停止试验。我们回顾了基于后验概率、后验预测概率和决策理论框架的贝叶斯顺序临床试验设计。一个相关的问题是贝叶斯序列设计是否需要为中期分析的规划进行调整。我们从三个角度回答这个问题:频率导向的角度,校准贝叶斯的角度,和主观贝叶斯的角度。我们还提供了新的见解的可能性原则,这是通常与统计推断和决策制定顺序临床试验。推导了一些理论结果,并进行了数值研究来说明和评估这些设计。
{"title":"On Bayesian Sequential Clinical Trial Designs","authors":"Tianjian Zhou, Yuan Ji","doi":"10.51387/23-NEJSDS24","DOIUrl":"https://doi.org/10.51387/23-NEJSDS24","url":null,"abstract":"Clinical trials usually involve sequential patient entry. When designing a clinical trial, it is often desirable to include a provision for interim analyses of accumulating data with the potential for stopping the trial early. We review Bayesian sequential clinical trial designs based on posterior probabilities, posterior predictive probabilities, and decision-theoretic frameworks. A pertinent question is whether Bayesian sequential designs need to be adjusted for the planning of interim analyses. We answer this question from three perspectives: a frequentist-oriented perspective, a calibrated Bayesian perspective, and a subjective Bayesian perspective. We also provide new insights into the likelihood principle, which is commonly tied to statistical inference and decision making in sequential clinical trials. Some theoretical results are derived, and numerical studies are conducted to illustrate and assess these designs.","PeriodicalId":94360,"journal":{"name":"The New England Journal of Statistics in Data Science","volume":"28 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73067114","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Poisson Modeling and Predicting English Premier League Goal Scoring 泊松模型与预测英超进球
Pub Date : 2021-05-20 DOI: 10.51387/21-NEJSDS1
Quang-Nguyen Nguyen
The English Premier League is well-known for being not only one of the most popular professional sports leagues in the world, but also one of the toughest competitions to predict. The first purpose of this research was to verify the consistency between goal scoring in the English Premier League and the Poisson process; specifically, the relationships between the number of goals scored in a match and the Poisson distribution, the time between goals throughout the course of a season and the exponential distribution, and the time location of goals during football games and the continuous uniform distribution. We found that the Poisson process and the three probability distributions accurately describe Premier League goal scoring. In addition, Poisson regression was utilized to predict outcomes for a Premier League season, using different sets of season data and with a large number of simulations being involved. We examined and compared various soccer metrics from our simulation results, including an English club’s chances of being the champions, finishing in the top four and bottom three, and relegation points.
众所周知,英超联赛不仅是世界上最受欢迎的职业体育联赛之一,也是最难预测的比赛之一。本研究的第一个目的是验证英超进球与泊松过程之间的一致性;具体来说,一场比赛的进球数与泊松分布的关系,一个赛季的进球间隔时间与指数分布的关系,以及足球比赛中进球的时间位置与连续均匀分布的关系。我们发现泊松过程和三种概率分布准确地描述了英超联赛的进球情况。此外,使用不同的赛季数据集和大量的模拟,利用泊松回归来预测英超联赛的结果。我们检查并比较了模拟结果中的各种足球指标,包括英国俱乐部成为冠军的机会、排名前四和倒数三名以及降级积分。
{"title":"Poisson Modeling and Predicting English Premier League Goal Scoring","authors":"Quang-Nguyen Nguyen","doi":"10.51387/21-NEJSDS1","DOIUrl":"https://doi.org/10.51387/21-NEJSDS1","url":null,"abstract":"The English Premier League is well-known for being not only one of the most popular professional sports leagues in the world, but also one of the toughest competitions to predict. The first purpose of this research was to verify the consistency between goal scoring in the English Premier League and the Poisson process; specifically, the relationships between the number of goals scored in a match and the Poisson distribution, the time between goals throughout the course of a season and the exponential distribution, and the time location of goals during football games and the continuous uniform distribution. We found that the Poisson process and the three probability distributions accurately describe Premier League goal scoring. In addition, Poisson regression was utilized to predict outcomes for a Premier League season, using different sets of season data and with a large number of simulations being involved. We examined and compared various soccer metrics from our simulation results, including an English club’s chances of being the champions, finishing in the top four and bottom three, and relegation points.","PeriodicalId":94360,"journal":{"name":"The New England Journal of Statistics in Data Science","volume":"15 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84322756","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
New Perspectives on Centering 定心新视角
Pub Date : 2021-03-22 DOI: 10.51387/23-nejsds31
Jack Prothero, Jan Hannig, J. Marron
Data matrix centering is an ever-present yet under-examined aspect of data analysis. Functional data analysis (FDA) often operates with a default of centering such that the vectors in one dimension have mean zero. We find that centering along the other dimension identifies a novel useful mode of variation beyond those familiar in FDA. We explore ambiguities in both matrix orientation and nomenclature. Differences between centerings and their potential interaction can be easily misunderstood. We propose a unified framework and new terminology for centering operations. We clearly demonstrate the intuition behind and consequences of each centering choice with informative graphics. We also propose a new direction energy hypothesis test as part of a series of diagnostics for determining which choice of centering is best for a data set. We explore the application of these diagnostics in several FDA settings.
数据矩阵定心是数据分析中一个一直存在但尚未得到充分研究的方面。功能数据分析(FDA)通常使用默认的居中操作,使得一维向量的平均值为零。我们发现,沿着其他维度的中心确定了一种新的有用的变异模式,超出了FDA所熟悉的模式。我们探讨了矩阵取向和命名法的歧义。中心之间的差异及其潜在的相互作用很容易被误解。我们提出了一个统一的定心操作框架和新的术语。我们用信息图形清楚地展示了每个中心选择背后的直觉和后果。我们还提出了一个新的方向能量假设检验,作为一系列诊断的一部分,用于确定哪种定心选择最适合数据集。我们探讨这些诊断在几个FDA设置的应用。
{"title":"New Perspectives on Centering","authors":"Jack Prothero, Jan Hannig, J. Marron","doi":"10.51387/23-nejsds31","DOIUrl":"https://doi.org/10.51387/23-nejsds31","url":null,"abstract":"Data matrix centering is an ever-present yet under-examined aspect of data analysis. Functional data analysis (FDA) often operates with a default of centering such that the vectors in one dimension have mean zero. We find that centering along the other dimension identifies a novel useful mode of variation beyond those familiar in FDA. We explore ambiguities in both matrix orientation and nomenclature. Differences between centerings and their potential interaction can be easily misunderstood. We propose a unified framework and new terminology for centering operations. We clearly demonstrate the intuition behind and consequences of each centering choice with informative graphics. We also propose a new direction energy hypothesis test as part of a series of diagnostics for determining which choice of centering is best for a data set. We explore the application of these diagnostics in several FDA settings.","PeriodicalId":94360,"journal":{"name":"The New England Journal of Statistics in Data Science","volume":"23 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-03-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88240517","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
A Compartment Model of Human Mobility and Early Covid-19 Dynamics in NYC 纽约市人类流动性和Covid-19早期动态的隔室模型
Pub Date : 2021-02-03 DOI: 10.51387/21-NEJSDS2
Ian Frankenburg, Sudipto Banerjee
In this paper, we build a mechanistic system to understand the relation between a reduction in human mobility and Covid-19 spread dynamics within New York City. To this end, we propose a multivariate compartmental model that jointly models smartphone mobility data and case counts during the first 90 days of the epidemic. Parameter calibration is achieved through the formulation of a general Bayesian hierarchical model to provide uncertainty quantification of resulting estimates. The open-source probabilistic programming language Stan is used for the requisite computation. Through sensitivity analysis and out-of-sample forecasting, we find our simple and interpretable model provides evidence that reductions in human mobility altered case dynamics.
在本文中,我们建立了一个机制系统来理解纽约市人员流动减少与Covid-19传播动态之间的关系。为此,我们提出了一个多变量分区模型,该模型联合模拟了疫情前90天的智能手机移动数据和病例数。参数校准是通过制定一般贝叶斯层次模型来实现的,以提供结果估计的不确定性量化。使用开源概率编程语言Stan进行必要的计算。通过敏感性分析和样本外预测,我们发现我们的简单且可解释的模型提供了证据,证明人类流动性的减少改变了病例动态。
{"title":"A Compartment Model of Human Mobility and Early Covid-19 Dynamics in NYC","authors":"Ian Frankenburg, Sudipto Banerjee","doi":"10.51387/21-NEJSDS2","DOIUrl":"https://doi.org/10.51387/21-NEJSDS2","url":null,"abstract":"In this paper, we build a mechanistic system to understand the relation between a reduction in human mobility and Covid-19 spread dynamics within New York City. To this end, we propose a multivariate compartmental model that jointly models smartphone mobility data and case counts during the first 90 days of the epidemic. Parameter calibration is achieved through the formulation of a general Bayesian hierarchical model to provide uncertainty quantification of resulting estimates. The open-source probabilistic programming language Stan is used for the requisite computation. Through sensitivity analysis and out-of-sample forecasting, we find our simple and interpretable model provides evidence that reductions in human mobility altered case dynamics.","PeriodicalId":94360,"journal":{"name":"The New England Journal of Statistics in Data Science","volume":"29 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-02-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81425031","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Comparison Between Bayesian and Frequentist Tail Probability Estimates 贝叶斯和频率尾部概率估计的比较
Pub Date : 2019-05-09 DOI: 10.51387/23-nejsds39
Nan Shen, B. Gonz'alez, L. Pericchi
Tail probability plays an important part in the extreme value theory. Sometimes the conclusions from two approaches for estimating the tail probability of extreme events, the Bayesian and the frequentist methods, can differ a lot. In 1999, a rainfall that caused more than 30,000 deaths in Venezuela was not captured by the simple frequentist extreme value techniques. However, this catastrophic rainfall was not surprising if the Bayesian inference was used to allow for parameter uncertainty and the full available data was exploited [4].In this paper, we investigate the reasons that the Bayesian estimator of the tail probability is always higher than the frequentist estimator. Sufficient conditions for this phenomenon are established both by using Jensen’s Inequality and by looking at Taylor series approximations, both of which point to the convexity of the distribution function.
尾概率在极值理论中占有重要地位。有时,估计极端事件尾部概率的两种方法(贝叶斯方法和频率方法)得出的结论可能相差很大。1999年,委内瑞拉一场造成3万多人死亡的降雨没有被简单的频率极值技术捕捉到。然而,如果使用贝叶斯推理来考虑参数的不确定性,并充分利用可用的数据[4],那么这种灾难性降雨并不令人惊讶。本文研究了尾概率的贝叶斯估计量总是高于频率估计量的原因。这种现象的充分条件是通过詹森不等式和泰勒级数近似建立的,两者都指向分布函数的凸性。
{"title":"Comparison Between Bayesian and Frequentist Tail Probability Estimates","authors":"Nan Shen, B. Gonz'alez, L. Pericchi","doi":"10.51387/23-nejsds39","DOIUrl":"https://doi.org/10.51387/23-nejsds39","url":null,"abstract":"Tail probability plays an important part in the extreme value theory. Sometimes the conclusions from two approaches for estimating the tail probability of extreme events, the Bayesian and the frequentist methods, can differ a lot. In 1999, a rainfall that caused more than 30,000 deaths in Venezuela was not captured by the simple frequentist extreme value techniques. However, this catastrophic rainfall was not surprising if the Bayesian inference was used to allow for parameter uncertainty and the full available data was exploited [4].\u0000In this paper, we investigate the reasons that the Bayesian estimator of the tail probability is always higher than the frequentist estimator. Sufficient conditions for this phenomenon are established both by using Jensen’s Inequality and by looking at Taylor series approximations, both of which point to the convexity of the distribution function.","PeriodicalId":94360,"journal":{"name":"The New England Journal of Statistics in Data Science","volume":"17 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2019-05-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86050401","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
期刊
The New England Journal of Statistics in Data Science
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1