首页 > 最新文献

Statistical Science最新文献

英文 中文
Scalable Empirical Bayes Inference and Bayesian Sensitivity Analysis. 可扩展经验贝叶斯推理与贝叶斯敏感性分析。
IF 3.9 1区 数学 Q1 STATISTICS & PROBABILITY Pub Date : 2024-11-01 Epub Date: 2024-10-30 DOI: 10.1214/24-sts936
Hani Doss, Antonio Linero
<p><p>Consider a Bayesian setup in which we observe <math><mi>Y</mi></math> , whose distribution depends on a parameter <math><mi>θ</mi></math> , that is, <math><mi>Y</mi> <mo>∣</mo> <mi>θ</mi> <mspace></mspace> <mo>~</mo> <mspace></mspace> <msub><mrow><mi>π</mi></mrow> <mrow><mi>Y</mi> <mo>∣</mo> <mi>θ</mi></mrow> </msub> </math> . The parameter <math><mi>θ</mi></math> is unknown and treated as random, and a prior distribution chosen from some parametric family <math> <mfenced> <mrow> <msub><mrow><mi>π</mi></mrow> <mrow><mi>θ</mi></mrow> </msub> <mo>(</mo> <mo>⋅</mo> <mo>;</mo> <mi>h</mi> <mo>)</mo> <mo>,</mo> <mi>h</mi> <mo>∈</mo> <mi>ℋ</mi></mrow> </mfenced> </math> , is to be placed on it. For the subjective Bayesian there is a single prior in the family which represents his or her beliefs about <math><mi>θ</mi></math> , but determination of this prior is very often extremely difficult. In the empirical Bayes approach, the latent distribution on <math><mi>θ</mi></math> is estimated from the data. This is usually done by choosing the value of the hyperparameter <math><mi>h</mi></math> that maximizes some criterion. Arguably the most common way of doing this is to let <math><mi>m</mi> <mo>(</mo> <mi>h</mi> <mo>)</mo></math> be the marginal likelihood of <math><mi>h</mi></math> , that is, <math><mi>m</mi> <mo>(</mo> <mi>h</mi> <mo>)</mo> <mo>=</mo> <mo>∫</mo> <msub><mrow><mi>π</mi></mrow> <mrow><mi>Y</mi> <mspace></mspace> <mo>∣</mo> <mspace></mspace> <mi>θ</mi></mrow> </msub> <msub><mrow><mi>v</mi></mrow> <mrow><mi>h</mi></mrow> </msub> <mo>(</mo> <mi>θ</mi> <mo>)</mo> <mspace></mspace> <mi>d</mi> <mi>θ</mi></math> , and choose the value of <math><mi>h</mi></math> that maximizes <math><mi>m</mi> <mo>(</mo> <mo>⋅</mo> <mo>)</mo></math> . Unfortunately, except for a handful of textbook examples, analytic evaluation of <math><mi>a</mi> <mi>r</mi> <mi>g</mi> <mspace></mspace> <msub><mrow><mi>m</mi> <mi>a</mi> <mi>x</mi></mrow> <mrow><mi>h</mi></mrow> </msub> <mspace></mspace> <mi>m</mi> <mo>(</mo> <mi>h</mi> <mo>)</mo></math> is not feasible. The purpose of this paper is two-fold. First, we review the literature on estimating it and find that the most commonly used procedures are either potentially highly inaccurate or don't scale well with the dimension of <math><mi>h</mi></math> , the dimension of <math><mi>θ</mi></math> , or both. Second, we present a method for estimating <math><mi>a</mi> <mi>r</mi> <mi>g</mi> <mspace></mspace> <msub><mrow><mi>m</mi> <mi>a</mi> <mi>x</mi></mrow> <mrow><mi>h</mi></mrow> </msub> <mspace></mspace> <mi>m</mi> <mo>(</mo> <mi>h</mi> <mo>)</mo></math> , based on Markov chain Monte Carlo, that applies very generally and scales well with dimension. Let <math><mi>g</mi></math> be a real-valued function of <math><mi>θ</mi></math> , and let <math><mi>I</mi> <mo>(</mo> <mi>h</mi> <mo>)</mo></math> be the posterior expectation of <math><mi>g</mi> <mo>(</mo> <mi>θ</mi> <mo>)</mo></math> when the prior is <math> <msub><mrow><m
考虑一个贝叶斯设置,其中我们观察到Y,其分布依赖于参数θ,即Y∣θ ~ π Y∣θ。参数θ是未知的,被视为随机的,从某个参数族中选择一个先验分布π θ(⋅;H), H∈H。对于主观贝叶斯来说,家庭中有一个单一的先验,代表他或她对θ的信念,但确定这个先验通常是非常困难的。在经验贝叶斯方法中,从数据中估计θ上的潜在分布。这通常通过选择使某些准则最大化的超参数h的值来完成。可以说,最常用的方法是设m (h)为h的边际似然,即m (h) =∫π Y∣θ v h (θ) d θ,并选择使m(⋅)最大化的h值。不幸的是,除了少数教科书上的例子外,对一个r g m a x h m (h)的解析评价是不可用的。本文的目的是双重的。首先,我们回顾了关于估计它的文献,发现最常用的程序要么可能非常不准确,要么不能很好地与h的维度、θ的维度或两者相适应。其次,我们提出了一种基于马尔可夫链蒙特卡罗的估计r g ma x h m (h)的方法,该方法非常普遍,并且随维度的变化而变化。设g为θ的实值函数,设I (h)为g (θ)的后验期望,当先验为v h时。作为我们方法的副产品,我们展示了如何获得族I (h), h∈h的点估计和全局有效的置信带。为了说明我们的方法的范围,我们提供了三个具有不同特征的详细示例。
{"title":"Scalable Empirical Bayes Inference and Bayesian Sensitivity Analysis.","authors":"Hani Doss, Antonio Linero","doi":"10.1214/24-sts936","DOIUrl":"10.1214/24-sts936","url":null,"abstract":"&lt;p&gt;&lt;p&gt;Consider a Bayesian setup in which we observe &lt;math&gt;&lt;mi&gt;Y&lt;/mi&gt;&lt;/math&gt; , whose distribution depends on a parameter &lt;math&gt;&lt;mi&gt;θ&lt;/mi&gt;&lt;/math&gt; , that is, &lt;math&gt;&lt;mi&gt;Y&lt;/mi&gt; &lt;mo&gt;∣&lt;/mo&gt; &lt;mi&gt;θ&lt;/mi&gt; &lt;mspace&gt;&lt;/mspace&gt; &lt;mo&gt;~&lt;/mo&gt; &lt;mspace&gt;&lt;/mspace&gt; &lt;msub&gt;&lt;mrow&gt;&lt;mi&gt;π&lt;/mi&gt;&lt;/mrow&gt; &lt;mrow&gt;&lt;mi&gt;Y&lt;/mi&gt; &lt;mo&gt;∣&lt;/mo&gt; &lt;mi&gt;θ&lt;/mi&gt;&lt;/mrow&gt; &lt;/msub&gt; &lt;/math&gt; . The parameter &lt;math&gt;&lt;mi&gt;θ&lt;/mi&gt;&lt;/math&gt; is unknown and treated as random, and a prior distribution chosen from some parametric family &lt;math&gt; &lt;mfenced&gt; &lt;mrow&gt; &lt;msub&gt;&lt;mrow&gt;&lt;mi&gt;π&lt;/mi&gt;&lt;/mrow&gt; &lt;mrow&gt;&lt;mi&gt;θ&lt;/mi&gt;&lt;/mrow&gt; &lt;/msub&gt; &lt;mo&gt;(&lt;/mo&gt; &lt;mo&gt;⋅&lt;/mo&gt; &lt;mo&gt;;&lt;/mo&gt; &lt;mi&gt;h&lt;/mi&gt; &lt;mo&gt;)&lt;/mo&gt; &lt;mo&gt;,&lt;/mo&gt; &lt;mi&gt;h&lt;/mi&gt; &lt;mo&gt;∈&lt;/mo&gt; &lt;mi&gt;ℋ&lt;/mi&gt;&lt;/mrow&gt; &lt;/mfenced&gt; &lt;/math&gt; , is to be placed on it. For the subjective Bayesian there is a single prior in the family which represents his or her beliefs about &lt;math&gt;&lt;mi&gt;θ&lt;/mi&gt;&lt;/math&gt; , but determination of this prior is very often extremely difficult. In the empirical Bayes approach, the latent distribution on &lt;math&gt;&lt;mi&gt;θ&lt;/mi&gt;&lt;/math&gt; is estimated from the data. This is usually done by choosing the value of the hyperparameter &lt;math&gt;&lt;mi&gt;h&lt;/mi&gt;&lt;/math&gt; that maximizes some criterion. Arguably the most common way of doing this is to let &lt;math&gt;&lt;mi&gt;m&lt;/mi&gt; &lt;mo&gt;(&lt;/mo&gt; &lt;mi&gt;h&lt;/mi&gt; &lt;mo&gt;)&lt;/mo&gt;&lt;/math&gt; be the marginal likelihood of &lt;math&gt;&lt;mi&gt;h&lt;/mi&gt;&lt;/math&gt; , that is, &lt;math&gt;&lt;mi&gt;m&lt;/mi&gt; &lt;mo&gt;(&lt;/mo&gt; &lt;mi&gt;h&lt;/mi&gt; &lt;mo&gt;)&lt;/mo&gt; &lt;mo&gt;=&lt;/mo&gt; &lt;mo&gt;∫&lt;/mo&gt; &lt;msub&gt;&lt;mrow&gt;&lt;mi&gt;π&lt;/mi&gt;&lt;/mrow&gt; &lt;mrow&gt;&lt;mi&gt;Y&lt;/mi&gt; &lt;mspace&gt;&lt;/mspace&gt; &lt;mo&gt;∣&lt;/mo&gt; &lt;mspace&gt;&lt;/mspace&gt; &lt;mi&gt;θ&lt;/mi&gt;&lt;/mrow&gt; &lt;/msub&gt; &lt;msub&gt;&lt;mrow&gt;&lt;mi&gt;v&lt;/mi&gt;&lt;/mrow&gt; &lt;mrow&gt;&lt;mi&gt;h&lt;/mi&gt;&lt;/mrow&gt; &lt;/msub&gt; &lt;mo&gt;(&lt;/mo&gt; &lt;mi&gt;θ&lt;/mi&gt; &lt;mo&gt;)&lt;/mo&gt; &lt;mspace&gt;&lt;/mspace&gt; &lt;mi&gt;d&lt;/mi&gt; &lt;mi&gt;θ&lt;/mi&gt;&lt;/math&gt; , and choose the value of &lt;math&gt;&lt;mi&gt;h&lt;/mi&gt;&lt;/math&gt; that maximizes &lt;math&gt;&lt;mi&gt;m&lt;/mi&gt; &lt;mo&gt;(&lt;/mo&gt; &lt;mo&gt;⋅&lt;/mo&gt; &lt;mo&gt;)&lt;/mo&gt;&lt;/math&gt; . Unfortunately, except for a handful of textbook examples, analytic evaluation of &lt;math&gt;&lt;mi&gt;a&lt;/mi&gt; &lt;mi&gt;r&lt;/mi&gt; &lt;mi&gt;g&lt;/mi&gt; &lt;mspace&gt;&lt;/mspace&gt; &lt;msub&gt;&lt;mrow&gt;&lt;mi&gt;m&lt;/mi&gt; &lt;mi&gt;a&lt;/mi&gt; &lt;mi&gt;x&lt;/mi&gt;&lt;/mrow&gt; &lt;mrow&gt;&lt;mi&gt;h&lt;/mi&gt;&lt;/mrow&gt; &lt;/msub&gt; &lt;mspace&gt;&lt;/mspace&gt; &lt;mi&gt;m&lt;/mi&gt; &lt;mo&gt;(&lt;/mo&gt; &lt;mi&gt;h&lt;/mi&gt; &lt;mo&gt;)&lt;/mo&gt;&lt;/math&gt; is not feasible. The purpose of this paper is two-fold. First, we review the literature on estimating it and find that the most commonly used procedures are either potentially highly inaccurate or don't scale well with the dimension of &lt;math&gt;&lt;mi&gt;h&lt;/mi&gt;&lt;/math&gt; , the dimension of &lt;math&gt;&lt;mi&gt;θ&lt;/mi&gt;&lt;/math&gt; , or both. Second, we present a method for estimating &lt;math&gt;&lt;mi&gt;a&lt;/mi&gt; &lt;mi&gt;r&lt;/mi&gt; &lt;mi&gt;g&lt;/mi&gt; &lt;mspace&gt;&lt;/mspace&gt; &lt;msub&gt;&lt;mrow&gt;&lt;mi&gt;m&lt;/mi&gt; &lt;mi&gt;a&lt;/mi&gt; &lt;mi&gt;x&lt;/mi&gt;&lt;/mrow&gt; &lt;mrow&gt;&lt;mi&gt;h&lt;/mi&gt;&lt;/mrow&gt; &lt;/msub&gt; &lt;mspace&gt;&lt;/mspace&gt; &lt;mi&gt;m&lt;/mi&gt; &lt;mo&gt;(&lt;/mo&gt; &lt;mi&gt;h&lt;/mi&gt; &lt;mo&gt;)&lt;/mo&gt;&lt;/math&gt; , based on Markov chain Monte Carlo, that applies very generally and scales well with dimension. Let &lt;math&gt;&lt;mi&gt;g&lt;/mi&gt;&lt;/math&gt; be a real-valued function of &lt;math&gt;&lt;mi&gt;θ&lt;/mi&gt;&lt;/math&gt; , and let &lt;math&gt;&lt;mi&gt;I&lt;/mi&gt; &lt;mo&gt;(&lt;/mo&gt; &lt;mi&gt;h&lt;/mi&gt; &lt;mo&gt;)&lt;/mo&gt;&lt;/math&gt; be the posterior expectation of &lt;math&gt;&lt;mi&gt;g&lt;/mi&gt; &lt;mo&gt;(&lt;/mo&gt; &lt;mi&gt;θ&lt;/mi&gt; &lt;mo&gt;)&lt;/mo&gt;&lt;/math&gt; when the prior is &lt;math&gt; &lt;msub&gt;&lt;mrow&gt;&lt;m","PeriodicalId":51172,"journal":{"name":"Statistical Science","volume":"39 4","pages":"601-622"},"PeriodicalIF":3.9,"publicationDate":"2024-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11654829/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142856550","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Variable Selection Using Bayesian Additive Regression Trees. 使用贝叶斯加性回归树进行变量选择。
IF 3.9 1区 数学 Q1 STATISTICS & PROBABILITY Pub Date : 2024-05-01 Epub Date: 2024-05-05 DOI: 10.1214/23-sts900
Chuji Luo, Michael J Daniels

Variable selection is an important statistical problem. This problem becomes more challenging when the candidate predictors are of mixed type (e.g. continuous and binary) and impact the response variable in nonlinear and/or non-additive ways. In this paper, we review existing variable selection approaches for the Bayesian additive regression trees (BART) model, a nonparametric regression model, which is flexible enough to capture the interactions between predictors and nonlinear relationships with the response. An emphasis of this review is on the ability to identify relevant predictors. We also propose two variable importance measures which can be used in a permutation-based variable selection approach, and a backward variable selection procedure for BART. We introduce these variations as a way of illustrating limitations and opportunities for improving current approaches and assess these via simulations.

变量选择是一个重要的统计问题。当候选预测因子为混合类型(如连续和二元),并以非线性和/或非加性方式影响响应变量时,这一问题就变得更具挑战性。在本文中,我们回顾了贝叶斯加性回归树(BART)模型的现有变量选择方法,该模型是一种非参数回归模型,具有足够的灵活性来捕捉预测因子之间的交互作用以及与响应的非线性关系。本综述的重点在于识别相关预测因子的能力。我们还提出了两种变量重要性测量方法,可用于基于置换的变量选择方法和 BART 的后向变量选择程序。我们介绍这些变式是为了说明当前方法的局限性和改进机会,并通过模拟对这些变式进行评估。
{"title":"Variable Selection Using Bayesian Additive Regression Trees.","authors":"Chuji Luo, Michael J Daniels","doi":"10.1214/23-sts900","DOIUrl":"10.1214/23-sts900","url":null,"abstract":"<p><p>Variable selection is an important statistical problem. This problem becomes more challenging when the candidate predictors are of mixed type (e.g. continuous and binary) and impact the response variable in nonlinear and/or non-additive ways. In this paper, we review existing variable selection approaches for the Bayesian additive regression trees (BART) model, a nonparametric regression model, which is flexible enough to capture the interactions between predictors and nonlinear relationships with the response. An emphasis of this review is on the ability to identify relevant predictors. We also propose two variable importance measures which can be used in a permutation-based variable selection approach, and a backward variable selection procedure for BART. We introduce these variations as a way of illustrating limitations and opportunities for improving current approaches and assess these via simulations.</p>","PeriodicalId":51172,"journal":{"name":"Statistical Science","volume":"39 2","pages":"286-304"},"PeriodicalIF":3.9,"publicationDate":"2024-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11395240/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142300349","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Methods for Integrating Trials and Non-experimental Data to Examine Treatment Effect Heterogeneity 综合试验和非实验数据检验治疗效果异质性的方法
1区 数学 Q1 STATISTICS & PROBABILITY Pub Date : 2023-11-01 DOI: 10.1214/23-sts890
Estimating treatment effects conditional on observed covariates can improve the ability to tailor treatments to particular individuals. Doing so effectively requires dealing with potential confounding, and also enough data to adequately estimate effect moderation. A recent influx of work has looked into estimating treatment effect heterogeneity using data from multiple randomized controlled trials and/or observational datasets. With many new methods available for assessing treatment effect heterogeneity using multiple studies, it is important to understand which methods are best used in which setting, how the methods compare to one another, and what needs to be done to continue progress in this field. This paper reviews these methods broken down by data setting: aggregate-level data, federated learning, and individual participant-level data. We define the conditional average treatment effect and discuss differences between parametric and nonparametric estimators, and we list key assumptions, both those that are required within a single study and those that are necessary for data combination. After describing existing approaches, we compare and contrast them and reveal open areas for future research. This review demonstrates that there are many possible approaches for estimating treatment effect heterogeneity through the combination of datasets, but that there is substantial work to be done to compare these methods through case studies and simulations, extend them to different settings, and refine them to account for various challenges present in real data.
根据观察到的协变量估计治疗效果可以提高为特定个体量身定制治疗的能力。要有效地做到这一点,需要处理潜在的混杂因素,还需要有足够的数据来充分估计效果的适度性。最近大量的研究工作着眼于利用多个随机对照试验和/或观察数据集的数据来估计治疗效果的异质性。有许多新的方法可以通过多个研究来评估治疗效果的异质性,重要的是要了解哪种方法在哪种情况下使用最好,这些方法如何相互比较,以及需要做些什么来继续在这一领域取得进展。本文回顾了按数据设置分类的这些方法:聚合级数据、联邦学习和个体参与者级数据。我们定义了条件平均处理效果,并讨论了参数估计器和非参数估计器之间的差异,我们列出了关键假设,包括单个研究中所需的假设和数据组合所必需的假设。在描述了现有的方法之后,我们对它们进行了比较和对比,并揭示了未来研究的开放领域。这篇综述表明,有许多可能的方法可以通过数据集的组合来估计治疗效果的异质性,但是还有大量的工作要做,通过案例研究和模拟来比较这些方法,将它们扩展到不同的环境中,并对它们进行改进,以解释实际数据中存在的各种挑战。
{"title":"Methods for Integrating Trials and Non-experimental Data to Examine Treatment Effect Heterogeneity","authors":"Carly Lupton Brantner, Ting-Hsuan Chang, Trang Quynh Nguyen, Hwanhee Hong, Leon Di Stefano, Elizabeth A. Stuart","doi":"10.1214/23-sts890","DOIUrl":"https://doi.org/10.1214/23-sts890","url":null,"abstract":"Estimating treatment effects conditional on observed covariates can improve the ability to tailor treatments to particular individuals. Doing so effectively requires dealing with potential confounding, and also enough data to adequately estimate effect moderation. A recent influx of work has looked into estimating treatment effect heterogeneity using data from multiple randomized controlled trials and/or observational datasets. With many new methods available for assessing treatment effect heterogeneity using multiple studies, it is important to understand which methods are best used in which setting, how the methods compare to one another, and what needs to be done to continue progress in this field. This paper reviews these methods broken down by data setting: aggregate-level data, federated learning, and individual participant-level data. We define the conditional average treatment effect and discuss differences between parametric and nonparametric estimators, and we list key assumptions, both those that are required within a single study and those that are necessary for data combination. After describing existing approaches, we compare and contrast them and reveal open areas for future research. This review demonstrates that there are many possible approaches for estimating treatment effect heterogeneity through the combination of datasets, but that there is substantial work to be done to compare these methods through case studies and simulations, extend them to different settings, and refine them to account for various challenges present in real data.","PeriodicalId":51172,"journal":{"name":"Statistical Science","volume":"19 2","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135515901","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Editorial: Special Issue on Reproducibility and Replicability 社论:重现性与可复制性特刊
IF 5.7 1区 数学 Q1 STATISTICS & PROBABILITY Pub Date : 2023-11-01 DOI: 10.1214/23-sts909
{"title":"Editorial: Special Issue on Reproducibility and Replicability","authors":"Alicia L Carriquiry, Michael J. Daniels, Nancy M Reid","doi":"10.1214/23-sts909","DOIUrl":"https://doi.org/10.1214/23-sts909","url":null,"abstract":"","PeriodicalId":51172,"journal":{"name":"Statistical Science","volume":"35 1","pages":""},"PeriodicalIF":5.7,"publicationDate":"2023-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139295518","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Replication Success Under Questionable Research Practices—a Simulation Study 有问题的研究实践下的复制成功——一项模拟研究
1区 数学 Q1 STATISTICS & PROBABILITY Pub Date : 2023-11-01 DOI: 10.1214/23-sts904
Increasing evidence suggests that the reproducibility and replicability of scientific findings is threatened by researchers employing questionable research practices (QRPs) in order to achieve statistically significant results. Numerous metrics have been developed to determine replication success but it has not yet been investigated how well those metrics perform in the presence of QRPs. This paper aims to compare the performance of different metrics quantifying replication success in the presence of four types of QRPs: cherry picking of outcomes, questionable interim analyses, questionable inclusion of covariates, and questionable subgroup analyses. Our results show that the metric based on the version of the sceptical p-value that is recalibrated in terms of effect size performs better in maintaining low values of overall type-I error rate, but often requires larger replication sample sizes compared to metrics based on significance, the controlled version of the sceptical p-value, meta-analysis or Bayes factors, especially when severe QRPs are employed.
越来越多的证据表明,科学发现的可重复性和可复制性受到研究人员采用可疑研究实践(qrp)以获得统计显著结果的威胁。已经开发了许多指标来确定复制是否成功,但尚未研究这些指标在qrp存在时的表现如何。本文旨在比较在四种qrp存在的情况下量化复制成功的不同指标的表现:结果的挑选,可疑的中期分析,可疑的协变量包含和可疑的亚组分析。我们的结果表明,根据效应大小重新校准的怀疑p值版本的度量在维持总体i型错误率的低值方面表现更好,但与基于显著性、怀疑p值的控制版本、元分析或贝叶斯因素的度量相比,通常需要更大的复制样本量,特别是当使用严重的qrp时。
{"title":"Replication Success Under Questionable Research Practices—a Simulation Study","authors":"Francesca Freuli, Leonhard Held, Rachel Heyard","doi":"10.1214/23-sts904","DOIUrl":"https://doi.org/10.1214/23-sts904","url":null,"abstract":"Increasing evidence suggests that the reproducibility and replicability of scientific findings is threatened by researchers employing questionable research practices (QRPs) in order to achieve statistically significant results. Numerous metrics have been developed to determine replication success but it has not yet been investigated how well those metrics perform in the presence of QRPs. This paper aims to compare the performance of different metrics quantifying replication success in the presence of four types of QRPs: cherry picking of outcomes, questionable interim analyses, questionable inclusion of covariates, and questionable subgroup analyses. Our results show that the metric based on the version of the sceptical p-value that is recalibrated in terms of effect size performs better in maintaining low values of overall type-I error rate, but often requires larger replication sample sizes compared to metrics based on significance, the controlled version of the sceptical p-value, meta-analysis or Bayes factors, especially when severe QRPs are employed.","PeriodicalId":51172,"journal":{"name":"Statistical Science","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135515751","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Game-Theoretic Statistics and Safe Anytime-Valid Inference 博弈论统计与安全的任意有效推理
1区 数学 Q1 STATISTICS & PROBABILITY Pub Date : 2023-11-01 DOI: 10.1214/23-sts894
Aaditya Ramdas, Peter Grünwald, Vladimir Vovk, Glenn Shafer
Safe anytime-valid inference (SAVI) provides measures of statistical evidence and certainty—e-processes for testing and confidence sequences for estimation—that remain valid at all stopping times, accommodating continuous monitoring and analysis of accumulating data and optional stopping or continuation for any reason. These measures crucially rely on test martingales, which are nonnegative martingales starting at one. Since a test martingale is the wealth process of a player in a betting game, SAVI centrally employs game-theoretic intuition, language and mathematics. We summarize the SAVI goals and philosophy, and report recent advances in testing composite hypotheses and estimating functionals in nonparametric settings.
安全的随时有效推理(SAVI)提供了统计证据和确定性过程的度量,用于测试和估计的置信度序列,在所有停止时间都保持有效,适应对累积数据的连续监测和分析,以及出于任何原因的可选停止或继续。这些度量主要依赖于从1开始的非负鞅的测试鞅。由于测试鞅是赌博游戏中玩家的财富过程,SAVI集中使用博弈论直觉、语言和数学。我们总结了SAVI的目标和理念,并报告了在非参数设置中测试复合假设和估计函数的最新进展。
{"title":"Game-Theoretic Statistics and Safe Anytime-Valid Inference","authors":"Aaditya Ramdas, Peter Grünwald, Vladimir Vovk, Glenn Shafer","doi":"10.1214/23-sts894","DOIUrl":"https://doi.org/10.1214/23-sts894","url":null,"abstract":"Safe anytime-valid inference (SAVI) provides measures of statistical evidence and certainty—e-processes for testing and confidence sequences for estimation—that remain valid at all stopping times, accommodating continuous monitoring and analysis of accumulating data and optional stopping or continuation for any reason. These measures crucially rely on test martingales, which are nonnegative martingales starting at one. Since a test martingale is the wealth process of a player in a betting game, SAVI centrally employs game-theoretic intuition, language and mathematics. We summarize the SAVI goals and philosophy, and report recent advances in testing composite hypotheses and estimating functionals in nonparametric settings.","PeriodicalId":51172,"journal":{"name":"Statistical Science","volume":"105 10","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135514670","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Online Multiple Hypothesis Testing 在线多元假设检验
1区 数学 Q1 STATISTICS & PROBABILITY Pub Date : 2023-11-01 DOI: 10.1214/23-sts901
Modern data analysis frequently involves large-scale hypothesis testing, which naturally gives rise to the problem of maintaining control of a suitable type I error rate, such as the false discovery rate (FDR). In many biomedical and technological applications, an additional complexity is that hypotheses are tested in an online manner, one-by-one over time. However, traditional procedures that control the FDR, such as the Benjamini-Hochberg procedure, assume that all p-values are available to be tested at a single time point. To address these challenges, a new field of methodology has developed over the past 15 years showing how to control error rates for online multiple hypothesis testing. In this framework, hypotheses arrive in a stream, and at each time point the analyst decides whether to reject the current hypothesis based both on the evidence against it, and on the previous rejection decisions. In this paper, we present a comprehensive exposition of the literature on online error rate control, with a review of key theory as well as a focus on applied examples. We also provide simulation results comparing different online testing algorithms and an up-to-date overview of the many methodological extensions that have been proposed.
现代数据分析经常涉及大规模的假设检验,这自然会产生保持对适当的I型错误率(如错误发现率(FDR))的控制的问题。在许多生物医学和技术应用中,一个额外的复杂性是,假设是通过在线方式一个接一个地进行测试的。然而,控制FDR的传统程序,如Benjamini-Hochberg程序,假设所有p值都可以在单个时间点进行测试。为了应对这些挑战,在过去的15年里,一个新的方法论领域已经发展起来,展示了如何控制在线多重假设检验的错误率。在这个框架中,假设以流的形式出现,在每个时间点,分析人员根据反对它的证据和先前的拒绝决定来决定是否拒绝当前的假设。在本文中,我们对在线错误率控制的文献进行了全面的阐述,对关键理论进行了回顾,并重点介绍了应用实例。我们还提供了比较不同在线测试算法的仿真结果,以及已提出的许多方法扩展的最新概述。
{"title":"Online Multiple Hypothesis Testing","authors":"David S. Robertson, James M. S. Wason, Aaditya Ramdas","doi":"10.1214/23-sts901","DOIUrl":"https://doi.org/10.1214/23-sts901","url":null,"abstract":"Modern data analysis frequently involves large-scale hypothesis testing, which naturally gives rise to the problem of maintaining control of a suitable type I error rate, such as the false discovery rate (FDR). In many biomedical and technological applications, an additional complexity is that hypotheses are tested in an online manner, one-by-one over time. However, traditional procedures that control the FDR, such as the Benjamini-Hochberg procedure, assume that all p-values are available to be tested at a single time point. To address these challenges, a new field of methodology has developed over the past 15 years showing how to control error rates for online multiple hypothesis testing. In this framework, hypotheses arrive in a stream, and at each time point the analyst decides whether to reject the current hypothesis based both on the evidence against it, and on the previous rejection decisions. In this paper, we present a comprehensive exposition of the literature on online error rate control, with a review of key theory as well as a focus on applied examples. We also provide simulation results comparing different online testing algorithms and an up-to-date overview of the many methodological extensions that have been proposed.","PeriodicalId":51172,"journal":{"name":"Statistical Science","volume":"1 2","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135515281","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Distributionally Robust and Generalizable Inference 分布鲁棒和可推广推理
1区 数学 Q1 STATISTICS & PROBABILITY Pub Date : 2023-11-01 DOI: 10.1214/23-sts902
We discuss recently developed methods that quantify the stability and generalizability of statistical findings under distributional changes. In many practical problems, the data is not drawn i.i.d. from the target population. For example, unobserved sampling bias, batch effects, or unknown associations might inflate the variance compared to i.i.d. sampling. For reliable statistical inference, it is thus necessary to account for these types of variation. We discuss and review two methods that allow to quantify distribution stability based on a single dataset. The first method computes the sensitivity of a parameter under worst-case distributional perturbations to understand which types of shift pose a threat to external validity. The second method treats distributional shifts as random which allows to assess average robustness (instead of worst-case). Based on a stability analysis of multiple estimators on a single dataset, it integrates both sampling and distributional uncertainty into a single confidence interval.
我们讨论了最近发展的方法来量化分布变化下统计结果的稳定性和概括性。在许多实际问题中,数据不是直接从目标人群中提取的。例如,与i.i.d抽样相比,未观察到的抽样偏差、批处理效应或未知关联可能会扩大方差。因此,为了可靠的统计推断,有必要考虑这些类型的变化。我们讨论并回顾了基于单个数据集量化分布稳定性的两种方法。第一种方法计算参数在最坏情况下分布扰动的敏感性,以了解哪种类型的移位对外部有效性构成威胁。第二种方法将分布移位视为随机,从而可以评估平均鲁棒性(而不是最坏情况)。它基于对单个数据集上多个估计量的稳定性分析,将抽样不确定性和分布不确定性集成到单个置信区间中。
{"title":"Distributionally Robust and Generalizable Inference","authors":"Dominik Rothenhäusler, Peter Bühlmann","doi":"10.1214/23-sts902","DOIUrl":"https://doi.org/10.1214/23-sts902","url":null,"abstract":"We discuss recently developed methods that quantify the stability and generalizability of statistical findings under distributional changes. In many practical problems, the data is not drawn i.i.d. from the target population. For example, unobserved sampling bias, batch effects, or unknown associations might inflate the variance compared to i.i.d. sampling. For reliable statistical inference, it is thus necessary to account for these types of variation. We discuss and review two methods that allow to quantify distribution stability based on a single dataset. The first method computes the sensitivity of a parameter under worst-case distributional perturbations to understand which types of shift pose a threat to external validity. The second method treats distributional shifts as random which allows to assess average robustness (instead of worst-case). Based on a stability analysis of multiple estimators on a single dataset, it integrates both sampling and distributional uncertainty into a single confidence interval.","PeriodicalId":51172,"journal":{"name":"Statistical Science","volume":"7 1-2","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135515521","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Replicability Across Multiple Studies 跨多个研究的可重复性
1区 数学 Q1 STATISTICS & PROBABILITY Pub Date : 2023-11-01 DOI: 10.1214/23-sts892
Meta-analysis is routinely performed in many scientific disciplines. This analysis is attractive since discoveries are possible even when all the individual studies are underpowered. However, the meta-analytic discoveries may be entirely driven by signal in a single study, and thus nonreplicable. Although the great majority of meta-analyses carried out to date do not infer on the replicability of their findings, it is possible to do so. We provide a selective overview of analyses that can be carried out towards establishing replicability of the scientific findings. We describe methods for the setting where a single outcome is examined in multiple studies (as is common in systematic reviews of medical interventions), as well as for the setting where multiple studies each examine multiple features (as in genomics applications). We also discuss some of the current shortcomings and future directions.
元分析在许多科学学科中是常规的。这种分析是有吸引力的,因为即使在所有单独的研究都不足的情况下,发现也是可能的。然而,元分析的发现可能完全是由单一研究中的信号驱动的,因此不可复制。尽管迄今为止进行的绝大多数荟萃分析都没有推断出他们的发现的可重复性,但这样做是可能的。我们提供了一个选择性的分析概述,可以朝着建立科学发现的可重复性进行。我们描述了在多个研究中检查单个结果的设置(如在医疗干预的系统评价中常见)以及多个研究每个检查多个特征的设置(如在基因组学应用中)的方法。我们还讨论了目前的一些不足和未来的发展方向。
{"title":"Replicability Across Multiple Studies","authors":"Marina Bogomolov, Ruth Heller","doi":"10.1214/23-sts892","DOIUrl":"https://doi.org/10.1214/23-sts892","url":null,"abstract":"Meta-analysis is routinely performed in many scientific disciplines. This analysis is attractive since discoveries are possible even when all the individual studies are underpowered. However, the meta-analytic discoveries may be entirely driven by signal in a single study, and thus nonreplicable. Although the great majority of meta-analyses carried out to date do not infer on the replicability of their findings, it is possible to do so. We provide a selective overview of analyses that can be carried out towards establishing replicability of the scientific findings. We describe methods for the setting where a single outcome is examined in multiple studies (as is common in systematic reviews of medical interventions), as well as for the setting where multiple studies each examine multiple features (as in genomics applications). We also discuss some of the current shortcomings and future directions.","PeriodicalId":51172,"journal":{"name":"Statistical Science","volume":"48 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135509930","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Defining Replicability of Prediction Rules 定义预测规则的可复制性
1区 数学 Q1 STATISTICS & PROBABILITY Pub Date : 2023-11-01 DOI: 10.1214/23-sts891
Giovanni Parmigiani
In this article, I propose an approach for defining replicability for prediction rules. Motivated by a recent report by the U.S.A. National Academy of Sciences, I start from the perspective that replicability is obtaining consistent results across studies suitable to address the same prediction question, each of which has obtained its own data. I then discuss concept and issues in defining key elements of this statement. I focus specifically on the meaning of “consistent results” in typical utilization contexts, and propose a multi-agent framework for defining replicability, in which agents are neither allied nor adversaries. I recover some of the prevalent practical approaches as special cases. I hope to provide guidance for a more systematic assessment of replicability in machine learning.
在本文中,我提出了一种定义预测规则的可复制性的方法。受美国国家科学院最近的一份报告的激励,我从可复制性的角度出发,即在适合解决相同预测问题的研究中获得一致的结果,每个研究都有自己的数据。然后,我讨论定义这一声明的关键要素的概念和问题。我特别关注典型使用环境中“一致结果”的含义,并提出了一个用于定义可复制性的多代理框架,其中代理既不是盟友也不是对手。我恢复了一些流行的实用方法作为特殊情况。我希望为更系统地评估机器学习的可复制性提供指导。
{"title":"Defining Replicability of Prediction Rules","authors":"Giovanni Parmigiani","doi":"10.1214/23-sts891","DOIUrl":"https://doi.org/10.1214/23-sts891","url":null,"abstract":"In this article, I propose an approach for defining replicability for prediction rules. Motivated by a recent report by the U.S.A. National Academy of Sciences, I start from the perspective that replicability is obtaining consistent results across studies suitable to address the same prediction question, each of which has obtained its own data. I then discuss concept and issues in defining key elements of this statement. I focus specifically on the meaning of “consistent results” in typical utilization contexts, and propose a multi-agent framework for defining replicability, in which agents are neither allied nor adversaries. I recover some of the prevalent practical approaches as special cases. I hope to provide guidance for a more systematic assessment of replicability in machine learning.","PeriodicalId":51172,"journal":{"name":"Statistical Science","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135509771","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Statistical Science
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1