有问题的研究实践下的复制成功——一项模拟研究

IF 3.9 1区数学 Q1 STATISTICS & PROBABILITY Statistical Science Pub Date : 2023-11-01 DOI:10.1214/23-sts904

Francesca Freuli, Leonhard Held, Rachel Heyard

{"title":"有问题的研究实践下的复制成功——一项模拟研究","authors":"Francesca Freuli, Leonhard Held, Rachel Heyard","doi":"10.1214/23-sts904","DOIUrl":null,"url":null,"abstract":"Increasing evidence suggests that the reproducibility and replicability of scientific findings is threatened by researchers employing questionable research practices (QRPs) in order to achieve statistically significant results. Numerous metrics have been developed to determine replication success but it has not yet been investigated how well those metrics perform in the presence of QRPs. This paper aims to compare the performance of different metrics quantifying replication success in the presence of four types of QRPs: cherry picking of outcomes, questionable interim analyses, questionable inclusion of covariates, and questionable subgroup analyses. Our results show that the metric based on the version of the sceptical p-value that is recalibrated in terms of effect size performs better in maintaining low values of overall type-I error rate, but often requires larger replication sample sizes compared to metrics based on significance, the controlled version of the sceptical p-value, meta-analysis or Bayes factors, especially when severe QRPs are employed.","PeriodicalId":51172,"journal":{"name":"Statistical Science","volume":"14 1","pages":"0"},"PeriodicalIF":3.9000,"publicationDate":"2023-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Replication Success Under Questionable Research Practices—a Simulation Study\",\"authors\":\"Francesca Freuli, Leonhard Held, Rachel Heyard\",\"doi\":\"10.1214/23-sts904\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Increasing evidence suggests that the reproducibility and replicability of scientific findings is threatened by researchers employing questionable research practices (QRPs) in order to achieve statistically significant results. Numerous metrics have been developed to determine replication success but it has not yet been investigated how well those metrics perform in the presence of QRPs. This paper aims to compare the performance of different metrics quantifying replication success in the presence of four types of QRPs: cherry picking of outcomes, questionable interim analyses, questionable inclusion of covariates, and questionable subgroup analyses. Our results show that the metric based on the version of the sceptical p-value that is recalibrated in terms of effect size performs better in maintaining low values of overall type-I error rate, but often requires larger replication sample sizes compared to metrics based on significance, the controlled version of the sceptical p-value, meta-analysis or Bayes factors, especially when severe QRPs are employed.\",\"PeriodicalId\":51172,\"journal\":{\"name\":\"Statistical Science\",\"volume\":\"14 1\",\"pages\":\"0\"},\"PeriodicalIF\":3.9000,\"publicationDate\":\"2023-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Statistical Science\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1214/23-sts904\",\"RegionNum\":1,\"RegionCategory\":\"数学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"STATISTICS & PROBABILITY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Statistical Science","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1214/23-sts904","RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"STATISTICS & PROBABILITY","Score":null,"Total":0}

引用次数: 2

摘要

越来越多的证据表明，科学发现的可重复性和可复制性受到研究人员采用可疑研究实践(qrp)以获得统计显著结果的威胁。已经开发了许多指标来确定复制是否成功，但尚未研究这些指标在qrp存在时的表现如何。本文旨在比较在四种qrp存在的情况下量化复制成功的不同指标的表现:结果的挑选，可疑的中期分析，可疑的协变量包含和可疑的亚组分析。我们的结果表明，根据效应大小重新校准的怀疑p值版本的度量在维持总体i型错误率的低值方面表现更好，但与基于显著性、怀疑p值的控制版本、元分析或贝叶斯因素的度量相比，通常需要更大的复制样本量，特别是当使用严重的qrp时。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Replication Success Under Questionable Research Practices—a Simulation Study

Increasing evidence suggests that the reproducibility and replicability of scientific findings is threatened by researchers employing questionable research practices (QRPs) in order to achieve statistically significant results. Numerous metrics have been developed to determine replication success but it has not yet been investigated how well those metrics perform in the presence of QRPs. This paper aims to compare the performance of different metrics quantifying replication success in the presence of four types of QRPs: cherry picking of outcomes, questionable interim analyses, questionable inclusion of covariates, and questionable subgroup analyses. Our results show that the metric based on the version of the sceptical p-value that is recalibrated in terms of effect size performs better in maintaining low values of overall type-I error rate, but often requires larger replication sample sizes compared to metrics based on significance, the controlled version of the sceptical p-value, meta-analysis or Bayes factors, especially when severe QRPs are employed.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Statistical Science 数学-统计学与概率论

CiteScore

6.50

自引率

1.80%

发文量

审稿时长

>12 weeks

期刊介绍： The central purpose of Statistical Science is to convey the richness, breadth and unity of the field by presenting the full range of contemporary statistical thought at a moderate technical level, accessible to the wide community of practitioners, researchers and students of statistics and probability.

期刊最新文献

Variable Selection Using Bayesian Additive Regression Trees. Defining Replicability of Prediction Rules Tracking Truth Through Measurement and the Spyglass of Statistics Replicability Across Multiple Studies Game-Theoretic Statistics and Safe Anytime-Valid Inference