首页 > 最新文献

Statistical Science最新文献

英文 中文
Methods for Integrating Trials and Non-experimental Data to Examine Treatment Effect Heterogeneity 综合试验和非实验数据检验治疗效果异质性的方法
1区 数学 Q1 STATISTICS & PROBABILITY Pub Date : 2023-11-01 DOI: 10.1214/23-sts890
Carly Lupton Brantner, Ting-Hsuan Chang, Trang Quynh Nguyen, Hwanhee Hong, Leon Di Stefano, Elizabeth A. Stuart
Estimating treatment effects conditional on observed covariates can improve the ability to tailor treatments to particular individuals. Doing so effectively requires dealing with potential confounding, and also enough data to adequately estimate effect moderation. A recent influx of work has looked into estimating treatment effect heterogeneity using data from multiple randomized controlled trials and/or observational datasets. With many new methods available for assessing treatment effect heterogeneity using multiple studies, it is important to understand which methods are best used in which setting, how the methods compare to one another, and what needs to be done to continue progress in this field. This paper reviews these methods broken down by data setting: aggregate-level data, federated learning, and individual participant-level data. We define the conditional average treatment effect and discuss differences between parametric and nonparametric estimators, and we list key assumptions, both those that are required within a single study and those that are necessary for data combination. After describing existing approaches, we compare and contrast them and reveal open areas for future research. This review demonstrates that there are many possible approaches for estimating treatment effect heterogeneity through the combination of datasets, but that there is substantial work to be done to compare these methods through case studies and simulations, extend them to different settings, and refine them to account for various challenges present in real data.
根据观察到的协变量估计治疗效果可以提高为特定个体量身定制治疗的能力。要有效地做到这一点,需要处理潜在的混杂因素,还需要有足够的数据来充分估计效果的适度性。最近大量的研究工作着眼于利用多个随机对照试验和/或观察数据集的数据来估计治疗效果的异质性。有许多新的方法可以通过多个研究来评估治疗效果的异质性,重要的是要了解哪种方法在哪种情况下使用最好,这些方法如何相互比较,以及需要做些什么来继续在这一领域取得进展。本文回顾了按数据设置分类的这些方法:聚合级数据、联邦学习和个体参与者级数据。我们定义了条件平均处理效果,并讨论了参数估计器和非参数估计器之间的差异,我们列出了关键假设,包括单个研究中所需的假设和数据组合所必需的假设。在描述了现有的方法之后,我们对它们进行了比较和对比,并揭示了未来研究的开放领域。这篇综述表明,有许多可能的方法可以通过数据集的组合来估计治疗效果的异质性,但是还有大量的工作要做,通过案例研究和模拟来比较这些方法,将它们扩展到不同的环境中,并对它们进行改进,以解释实际数据中存在的各种挑战。
{"title":"Methods for Integrating Trials and Non-experimental Data to Examine Treatment Effect Heterogeneity","authors":"Carly Lupton Brantner, Ting-Hsuan Chang, Trang Quynh Nguyen, Hwanhee Hong, Leon Di Stefano, Elizabeth A. Stuart","doi":"10.1214/23-sts890","DOIUrl":"https://doi.org/10.1214/23-sts890","url":null,"abstract":"Estimating treatment effects conditional on observed covariates can improve the ability to tailor treatments to particular individuals. Doing so effectively requires dealing with potential confounding, and also enough data to adequately estimate effect moderation. A recent influx of work has looked into estimating treatment effect heterogeneity using data from multiple randomized controlled trials and/or observational datasets. With many new methods available for assessing treatment effect heterogeneity using multiple studies, it is important to understand which methods are best used in which setting, how the methods compare to one another, and what needs to be done to continue progress in this field. This paper reviews these methods broken down by data setting: aggregate-level data, federated learning, and individual participant-level data. We define the conditional average treatment effect and discuss differences between parametric and nonparametric estimators, and we list key assumptions, both those that are required within a single study and those that are necessary for data combination. After describing existing approaches, we compare and contrast them and reveal open areas for future research. This review demonstrates that there are many possible approaches for estimating treatment effect heterogeneity through the combination of datasets, but that there is substantial work to be done to compare these methods through case studies and simulations, extend them to different settings, and refine them to account for various challenges present in real data.","PeriodicalId":51172,"journal":{"name":"Statistical Science","volume":"19 2","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135515901","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Replication Success Under Questionable Research Practices—a Simulation Study 有问题的研究实践下的复制成功——一项模拟研究
1区 数学 Q1 STATISTICS & PROBABILITY Pub Date : 2023-11-01 DOI: 10.1214/23-sts904
Francesca Freuli, Leonhard Held, Rachel Heyard
Increasing evidence suggests that the reproducibility and replicability of scientific findings is threatened by researchers employing questionable research practices (QRPs) in order to achieve statistically significant results. Numerous metrics have been developed to determine replication success but it has not yet been investigated how well those metrics perform in the presence of QRPs. This paper aims to compare the performance of different metrics quantifying replication success in the presence of four types of QRPs: cherry picking of outcomes, questionable interim analyses, questionable inclusion of covariates, and questionable subgroup analyses. Our results show that the metric based on the version of the sceptical p-value that is recalibrated in terms of effect size performs better in maintaining low values of overall type-I error rate, but often requires larger replication sample sizes compared to metrics based on significance, the controlled version of the sceptical p-value, meta-analysis or Bayes factors, especially when severe QRPs are employed.
越来越多的证据表明,科学发现的可重复性和可复制性受到研究人员采用可疑研究实践(qrp)以获得统计显著结果的威胁。已经开发了许多指标来确定复制是否成功,但尚未研究这些指标在qrp存在时的表现如何。本文旨在比较在四种qrp存在的情况下量化复制成功的不同指标的表现:结果的挑选,可疑的中期分析,可疑的协变量包含和可疑的亚组分析。我们的结果表明,根据效应大小重新校准的怀疑p值版本的度量在维持总体i型错误率的低值方面表现更好,但与基于显著性、怀疑p值的控制版本、元分析或贝叶斯因素的度量相比,通常需要更大的复制样本量,特别是当使用严重的qrp时。
{"title":"Replication Success Under Questionable Research Practices—a Simulation Study","authors":"Francesca Freuli, Leonhard Held, Rachel Heyard","doi":"10.1214/23-sts904","DOIUrl":"https://doi.org/10.1214/23-sts904","url":null,"abstract":"Increasing evidence suggests that the reproducibility and replicability of scientific findings is threatened by researchers employing questionable research practices (QRPs) in order to achieve statistically significant results. Numerous metrics have been developed to determine replication success but it has not yet been investigated how well those metrics perform in the presence of QRPs. This paper aims to compare the performance of different metrics quantifying replication success in the presence of four types of QRPs: cherry picking of outcomes, questionable interim analyses, questionable inclusion of covariates, and questionable subgroup analyses. Our results show that the metric based on the version of the sceptical p-value that is recalibrated in terms of effect size performs better in maintaining low values of overall type-I error rate, but often requires larger replication sample sizes compared to metrics based on significance, the controlled version of the sceptical p-value, meta-analysis or Bayes factors, especially when severe QRPs are employed.","PeriodicalId":51172,"journal":{"name":"Statistical Science","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135515751","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Game-Theoretic Statistics and Safe Anytime-Valid Inference 博弈论统计与安全的任意有效推理
1区 数学 Q1 STATISTICS & PROBABILITY Pub Date : 2023-11-01 DOI: 10.1214/23-sts894
Aaditya Ramdas, Peter Grünwald, Vladimir Vovk, Glenn Shafer
Safe anytime-valid inference (SAVI) provides measures of statistical evidence and certainty—e-processes for testing and confidence sequences for estimation—that remain valid at all stopping times, accommodating continuous monitoring and analysis of accumulating data and optional stopping or continuation for any reason. These measures crucially rely on test martingales, which are nonnegative martingales starting at one. Since a test martingale is the wealth process of a player in a betting game, SAVI centrally employs game-theoretic intuition, language and mathematics. We summarize the SAVI goals and philosophy, and report recent advances in testing composite hypotheses and estimating functionals in nonparametric settings.
安全的随时有效推理(SAVI)提供了统计证据和确定性过程的度量,用于测试和估计的置信度序列,在所有停止时间都保持有效,适应对累积数据的连续监测和分析,以及出于任何原因的可选停止或继续。这些度量主要依赖于从1开始的非负鞅的测试鞅。由于测试鞅是赌博游戏中玩家的财富过程,SAVI集中使用博弈论直觉、语言和数学。我们总结了SAVI的目标和理念,并报告了在非参数设置中测试复合假设和估计函数的最新进展。
{"title":"Game-Theoretic Statistics and Safe Anytime-Valid Inference","authors":"Aaditya Ramdas, Peter Grünwald, Vladimir Vovk, Glenn Shafer","doi":"10.1214/23-sts894","DOIUrl":"https://doi.org/10.1214/23-sts894","url":null,"abstract":"Safe anytime-valid inference (SAVI) provides measures of statistical evidence and certainty—e-processes for testing and confidence sequences for estimation—that remain valid at all stopping times, accommodating continuous monitoring and analysis of accumulating data and optional stopping or continuation for any reason. These measures crucially rely on test martingales, which are nonnegative martingales starting at one. Since a test martingale is the wealth process of a player in a betting game, SAVI centrally employs game-theoretic intuition, language and mathematics. We summarize the SAVI goals and philosophy, and report recent advances in testing composite hypotheses and estimating functionals in nonparametric settings.","PeriodicalId":51172,"journal":{"name":"Statistical Science","volume":"105 10","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135514670","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Online Multiple Hypothesis Testing 在线多元假设检验
1区 数学 Q1 STATISTICS & PROBABILITY Pub Date : 2023-11-01 DOI: 10.1214/23-sts901
David S. Robertson, James M. S. Wason, Aaditya Ramdas
Modern data analysis frequently involves large-scale hypothesis testing, which naturally gives rise to the problem of maintaining control of a suitable type I error rate, such as the false discovery rate (FDR). In many biomedical and technological applications, an additional complexity is that hypotheses are tested in an online manner, one-by-one over time. However, traditional procedures that control the FDR, such as the Benjamini-Hochberg procedure, assume that all p-values are available to be tested at a single time point. To address these challenges, a new field of methodology has developed over the past 15 years showing how to control error rates for online multiple hypothesis testing. In this framework, hypotheses arrive in a stream, and at each time point the analyst decides whether to reject the current hypothesis based both on the evidence against it, and on the previous rejection decisions. In this paper, we present a comprehensive exposition of the literature on online error rate control, with a review of key theory as well as a focus on applied examples. We also provide simulation results comparing different online testing algorithms and an up-to-date overview of the many methodological extensions that have been proposed.
现代数据分析经常涉及大规模的假设检验,这自然会产生保持对适当的I型错误率(如错误发现率(FDR))的控制的问题。在许多生物医学和技术应用中,一个额外的复杂性是,假设是通过在线方式一个接一个地进行测试的。然而,控制FDR的传统程序,如Benjamini-Hochberg程序,假设所有p值都可以在单个时间点进行测试。为了应对这些挑战,在过去的15年里,一个新的方法论领域已经发展起来,展示了如何控制在线多重假设检验的错误率。在这个框架中,假设以流的形式出现,在每个时间点,分析人员根据反对它的证据和先前的拒绝决定来决定是否拒绝当前的假设。在本文中,我们对在线错误率控制的文献进行了全面的阐述,对关键理论进行了回顾,并重点介绍了应用实例。我们还提供了比较不同在线测试算法的仿真结果,以及已提出的许多方法扩展的最新概述。
{"title":"Online Multiple Hypothesis Testing","authors":"David S. Robertson, James M. S. Wason, Aaditya Ramdas","doi":"10.1214/23-sts901","DOIUrl":"https://doi.org/10.1214/23-sts901","url":null,"abstract":"Modern data analysis frequently involves large-scale hypothesis testing, which naturally gives rise to the problem of maintaining control of a suitable type I error rate, such as the false discovery rate (FDR). In many biomedical and technological applications, an additional complexity is that hypotheses are tested in an online manner, one-by-one over time. However, traditional procedures that control the FDR, such as the Benjamini-Hochberg procedure, assume that all p-values are available to be tested at a single time point. To address these challenges, a new field of methodology has developed over the past 15 years showing how to control error rates for online multiple hypothesis testing. In this framework, hypotheses arrive in a stream, and at each time point the analyst decides whether to reject the current hypothesis based both on the evidence against it, and on the previous rejection decisions. In this paper, we present a comprehensive exposition of the literature on online error rate control, with a review of key theory as well as a focus on applied examples. We also provide simulation results comparing different online testing algorithms and an up-to-date overview of the many methodological extensions that have been proposed.","PeriodicalId":51172,"journal":{"name":"Statistical Science","volume":"1 2","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135515281","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Distributionally Robust and Generalizable Inference 分布鲁棒和可推广推理
1区 数学 Q1 STATISTICS & PROBABILITY Pub Date : 2023-11-01 DOI: 10.1214/23-sts902
Dominik Rothenhäusler, Peter Bühlmann
We discuss recently developed methods that quantify the stability and generalizability of statistical findings under distributional changes. In many practical problems, the data is not drawn i.i.d. from the target population. For example, unobserved sampling bias, batch effects, or unknown associations might inflate the variance compared to i.i.d. sampling. For reliable statistical inference, it is thus necessary to account for these types of variation. We discuss and review two methods that allow to quantify distribution stability based on a single dataset. The first method computes the sensitivity of a parameter under worst-case distributional perturbations to understand which types of shift pose a threat to external validity. The second method treats distributional shifts as random which allows to assess average robustness (instead of worst-case). Based on a stability analysis of multiple estimators on a single dataset, it integrates both sampling and distributional uncertainty into a single confidence interval.
我们讨论了最近发展的方法来量化分布变化下统计结果的稳定性和概括性。在许多实际问题中,数据不是直接从目标人群中提取的。例如,与i.i.d抽样相比,未观察到的抽样偏差、批处理效应或未知关联可能会扩大方差。因此,为了可靠的统计推断,有必要考虑这些类型的变化。我们讨论并回顾了基于单个数据集量化分布稳定性的两种方法。第一种方法计算参数在最坏情况下分布扰动的敏感性,以了解哪种类型的移位对外部有效性构成威胁。第二种方法将分布移位视为随机,从而可以评估平均鲁棒性(而不是最坏情况)。它基于对单个数据集上多个估计量的稳定性分析,将抽样不确定性和分布不确定性集成到单个置信区间中。
{"title":"Distributionally Robust and Generalizable Inference","authors":"Dominik Rothenhäusler, Peter Bühlmann","doi":"10.1214/23-sts902","DOIUrl":"https://doi.org/10.1214/23-sts902","url":null,"abstract":"We discuss recently developed methods that quantify the stability and generalizability of statistical findings under distributional changes. In many practical problems, the data is not drawn i.i.d. from the target population. For example, unobserved sampling bias, batch effects, or unknown associations might inflate the variance compared to i.i.d. sampling. For reliable statistical inference, it is thus necessary to account for these types of variation. We discuss and review two methods that allow to quantify distribution stability based on a single dataset. The first method computes the sensitivity of a parameter under worst-case distributional perturbations to understand which types of shift pose a threat to external validity. The second method treats distributional shifts as random which allows to assess average robustness (instead of worst-case). Based on a stability analysis of multiple estimators on a single dataset, it integrates both sampling and distributional uncertainty into a single confidence interval.","PeriodicalId":51172,"journal":{"name":"Statistical Science","volume":"7 1-2","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135515521","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Replicability Across Multiple Studies 跨多个研究的可重复性
1区 数学 Q1 STATISTICS & PROBABILITY Pub Date : 2023-11-01 DOI: 10.1214/23-sts892
Marina Bogomolov, Ruth Heller
Meta-analysis is routinely performed in many scientific disciplines. This analysis is attractive since discoveries are possible even when all the individual studies are underpowered. However, the meta-analytic discoveries may be entirely driven by signal in a single study, and thus nonreplicable. Although the great majority of meta-analyses carried out to date do not infer on the replicability of their findings, it is possible to do so. We provide a selective overview of analyses that can be carried out towards establishing replicability of the scientific findings. We describe methods for the setting where a single outcome is examined in multiple studies (as is common in systematic reviews of medical interventions), as well as for the setting where multiple studies each examine multiple features (as in genomics applications). We also discuss some of the current shortcomings and future directions.
元分析在许多科学学科中是常规的。这种分析是有吸引力的,因为即使在所有单独的研究都不足的情况下,发现也是可能的。然而,元分析的发现可能完全是由单一研究中的信号驱动的,因此不可复制。尽管迄今为止进行的绝大多数荟萃分析都没有推断出他们的发现的可重复性,但这样做是可能的。我们提供了一个选择性的分析概述,可以朝着建立科学发现的可重复性进行。我们描述了在多个研究中检查单个结果的设置(如在医疗干预的系统评价中常见)以及多个研究每个检查多个特征的设置(如在基因组学应用中)的方法。我们还讨论了目前的一些不足和未来的发展方向。
{"title":"Replicability Across Multiple Studies","authors":"Marina Bogomolov, Ruth Heller","doi":"10.1214/23-sts892","DOIUrl":"https://doi.org/10.1214/23-sts892","url":null,"abstract":"Meta-analysis is routinely performed in many scientific disciplines. This analysis is attractive since discoveries are possible even when all the individual studies are underpowered. However, the meta-analytic discoveries may be entirely driven by signal in a single study, and thus nonreplicable. Although the great majority of meta-analyses carried out to date do not infer on the replicability of their findings, it is possible to do so. We provide a selective overview of analyses that can be carried out towards establishing replicability of the scientific findings. We describe methods for the setting where a single outcome is examined in multiple studies (as is common in systematic reviews of medical interventions), as well as for the setting where multiple studies each examine multiple features (as in genomics applications). We also discuss some of the current shortcomings and future directions.","PeriodicalId":51172,"journal":{"name":"Statistical Science","volume":"48 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135509930","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Defining Replicability of Prediction Rules 定义预测规则的可复制性
1区 数学 Q1 STATISTICS & PROBABILITY Pub Date : 2023-11-01 DOI: 10.1214/23-sts891
Giovanni Parmigiani
In this article, I propose an approach for defining replicability for prediction rules. Motivated by a recent report by the U.S.A. National Academy of Sciences, I start from the perspective that replicability is obtaining consistent results across studies suitable to address the same prediction question, each of which has obtained its own data. I then discuss concept and issues in defining key elements of this statement. I focus specifically on the meaning of “consistent results” in typical utilization contexts, and propose a multi-agent framework for defining replicability, in which agents are neither allied nor adversaries. I recover some of the prevalent practical approaches as special cases. I hope to provide guidance for a more systematic assessment of replicability in machine learning.
在本文中,我提出了一种定义预测规则的可复制性的方法。受美国国家科学院最近的一份报告的激励,我从可复制性的角度出发,即在适合解决相同预测问题的研究中获得一致的结果,每个研究都有自己的数据。然后,我讨论定义这一声明的关键要素的概念和问题。我特别关注典型使用环境中“一致结果”的含义,并提出了一个用于定义可复制性的多代理框架,其中代理既不是盟友也不是对手。我恢复了一些流行的实用方法作为特殊情况。我希望为更系统地评估机器学习的可复制性提供指导。
{"title":"Defining Replicability of Prediction Rules","authors":"Giovanni Parmigiani","doi":"10.1214/23-sts891","DOIUrl":"https://doi.org/10.1214/23-sts891","url":null,"abstract":"In this article, I propose an approach for defining replicability for prediction rules. Motivated by a recent report by the U.S.A. National Academy of Sciences, I start from the perspective that replicability is obtaining consistent results across studies suitable to address the same prediction question, each of which has obtained its own data. I then discuss concept and issues in defining key elements of this statement. I focus specifically on the meaning of “consistent results” in typical utilization contexts, and propose a multi-agent framework for defining replicability, in which agents are neither allied nor adversaries. I recover some of the prevalent practical approaches as special cases. I hope to provide guidance for a more systematic assessment of replicability in machine learning.","PeriodicalId":51172,"journal":{"name":"Statistical Science","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135509771","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Tracking Truth Through Measurement and the Spyglass of Statistics 通过测量和统计学的望远镜追踪真相
1区 数学 Q1 STATISTICS & PROBABILITY Pub Date : 2023-11-01 DOI: 10.1214/23-sts899
Antonio Possolo
The measurement of a quantity is reproducible when mutually independent, multiple measurements made of it yield mutually consistent measurement results, that is, when the measured values, after due allowance for their associated uncertainties, do not differ significantly from one another. Interlaboratory comparisons organized deliberately for the purpose, and meta-analyses that are structured so as to be fit for the same purpose, are procedures of choice to ascertain measurement reproducibility. The realistic evaluation of measurement uncertainty is a key preliminary to the assessment of reproducibility because lack of reproducibility manifests itself as dispersion or variability of measured values in excess of what their associated uncertainties suggest that they should exhibit. For this reason, we review the distinctive traits of measurement in the physical sciences and technologies, including medicine, and discuss the meaning and expression of measurement uncertainty. This contribution illustrates the application of statistical models and methods to quantify measurement uncertainty and to assess reproducibility in four concrete, real-life examples, in the process revealing that lack of reproducibility can be a consequence of one or more of the following: intrinsic differences between laboratories making measurements; choice of statistical model and of procedure for data reduction or of causes yet to be identified. Despite the instances of lack of reproducibility that we review, and many others like them, the outlook is optimistic. First, because “lack of reproducibility is not necessarily bad news; it may herald new discoveries and signal scientific progress” (Nat. Phys. 16 (2020) 117–119). Second, and as the example about the measurement of the Newtonian constant of gravitation, G, illustrates, when faced with a reproducibility crisis the scientific community often engages in cooperative efforts to understand the root causes of the lack of reproducibility, leading to advances in scientific knowledge.
当对一个量进行的相互独立的多次测量产生相互一致的测量结果时,该量的测量是可重复的,也就是说,当测量值在适当考虑其相关不确定度后,彼此之间没有显着差异时。为了达到目的而特意组织的实验室间比较,以及为了达到同样目的而组织的荟萃分析,都是确定测量可重复性的首选程序。测量不确定度的实际评估是评估可重复性的关键先决条件,因为缺乏可重复性表现为测量值的分散或变异性超过了与其相关的不确定度所显示的值。因此,我们回顾了测量在包括医学在内的物理科学和技术中的独特特征,并讨论了测量不确定度的含义和表达。这一贡献说明了统计模型和方法的应用,以量化测量不确定性,并在四个具体的,现实生活中的例子中评估再现性,在这个过程中揭示了缺乏再现性可能是以下一个或多个结果:实验室之间的内在差异进行测量;选择统计模型和数据缩减程序,或选择尚未确定的原因。尽管我们回顾了缺乏可重复性的例子,以及许多其他类似的例子,但前景是乐观的。首先,因为“缺乏可重复性不一定是坏消息;它可能预示着新的发现和标志着科学进步”(Nat. Phys. 16(2020) 117-119)。其次,正如牛顿引力常数G测量的例子所表明的那样,当面临可重复性危机时,科学界通常会通过合作努力来了解缺乏可重复性的根本原因,从而推动科学知识的进步。
{"title":"Tracking Truth Through Measurement and the Spyglass of Statistics","authors":"Antonio Possolo","doi":"10.1214/23-sts899","DOIUrl":"https://doi.org/10.1214/23-sts899","url":null,"abstract":"The measurement of a quantity is reproducible when mutually independent, multiple measurements made of it yield mutually consistent measurement results, that is, when the measured values, after due allowance for their associated uncertainties, do not differ significantly from one another. Interlaboratory comparisons organized deliberately for the purpose, and meta-analyses that are structured so as to be fit for the same purpose, are procedures of choice to ascertain measurement reproducibility. The realistic evaluation of measurement uncertainty is a key preliminary to the assessment of reproducibility because lack of reproducibility manifests itself as dispersion or variability of measured values in excess of what their associated uncertainties suggest that they should exhibit. For this reason, we review the distinctive traits of measurement in the physical sciences and technologies, including medicine, and discuss the meaning and expression of measurement uncertainty. This contribution illustrates the application of statistical models and methods to quantify measurement uncertainty and to assess reproducibility in four concrete, real-life examples, in the process revealing that lack of reproducibility can be a consequence of one or more of the following: intrinsic differences between laboratories making measurements; choice of statistical model and of procedure for data reduction or of causes yet to be identified. Despite the instances of lack of reproducibility that we review, and many others like them, the outlook is optimistic. First, because “lack of reproducibility is not necessarily bad news; it may herald new discoveries and signal scientific progress” (Nat. Phys. 16 (2020) 117–119). Second, and as the example about the measurement of the Newtonian constant of gravitation, G, illustrates, when faced with a reproducibility crisis the scientific community often engages in cooperative efforts to understand the root causes of the lack of reproducibility, leading to advances in scientific knowledge.","PeriodicalId":51172,"journal":{"name":"Statistical Science","volume":"45 11","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135509773","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Note on Legendre’s Method of Least Squares 关于勒让德最小二乘法的注解
IF 5.7 1区 数学 Q1 STATISTICS & PROBABILITY Pub Date : 2023-08-01 DOI: 10.1214/23-sts887
J. Nyblom
{"title":"Note on Legendre’s Method of Least Squares","authors":"J. Nyblom","doi":"10.1214/23-sts887","DOIUrl":"https://doi.org/10.1214/23-sts887","url":null,"abstract":"","PeriodicalId":51172,"journal":{"name":"Statistical Science","volume":"1 1","pages":""},"PeriodicalIF":5.7,"publicationDate":"2023-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42188583","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Rejoinder: Response-Adaptive Randomization in Clinical Trials 反驳:临床试验中的反应适应性随机化
IF 5.7 1区 数学 Q1 STATISTICS & PROBABILITY Pub Date : 2023-05-01 DOI: 10.1214/23-sts865rej
D. Robertson, K. M. Lee, Boryana C. López-Kolkovska, S. Villar
{"title":"Rejoinder: Response-Adaptive Randomization in Clinical Trials","authors":"D. Robertson, K. M. Lee, Boryana C. López-Kolkovska, S. Villar","doi":"10.1214/23-sts865rej","DOIUrl":"https://doi.org/10.1214/23-sts865rej","url":null,"abstract":"","PeriodicalId":51172,"journal":{"name":"Statistical Science","volume":" ","pages":""},"PeriodicalIF":5.7,"publicationDate":"2023-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45531681","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Statistical Science
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1