Correspondence measures for assessing replication success.

IF 7.8 1区心理学 Q1 PSYCHOLOGY, MULTIDISCIPLINARY Psychological methods Pub Date : 2025-08-01 Epub Date: 2023-07-27 DOI:10.1037/met0000597

Peter M Steiner, Patrick Sheehan, Vivian C Wong

{"title":"Correspondence measures for assessing replication success.","authors":"Peter M Steiner, Patrick Sheehan, Vivian C Wong","doi":"10.1037/met0000597","DOIUrl":null,"url":null,"abstract":"<p><p>Given recent evidence challenging the replicability of results in the social and behavioral sciences, critical questions have been raised about appropriate measures for determining replication success in comparing effect estimates across studies. At issue is the fact that conclusions about replication success often depend on the measure used for evaluating correspondence in results. Despite the importance of choosing an appropriate measure, there is still no widespread agreement about which measures should be used. This article addresses these questions by describing formally the most commonly used measures for assessing replication success, and by comparing their performance in different contexts according to their replication probabilities-that is, the probability of obtaining replication success given study-specific settings. The measures may be characterized broadly as conclusion-based approaches, which assess the congruence of two independent studies' conclusions about the presence of an effect, and distance-based approaches, which test for a significant difference or equivalence of two effect estimates. We also introduce a new measure for assessing replication success called the correspondence test, which combines a difference and equivalence test in the same framework. To help researchers plan prospective replication efforts, we provide closed formulas for power calculations that can be used to determine the minimum detectable effect size (and thus, sample sizes) for each study so that a predetermined minimum replication probability can be achieved. Finally, we use a replication data set from the Open Science Collaboration (2015) to demonstrate the extent to which conclusions about replication success depend on the correspondence measure selected. (PsycInfo Database Record (c) 2025 APA, all rights reserved).</p>","PeriodicalId":20782,"journal":{"name":"Psychological methods","volume":" ","pages":"793-814"},"PeriodicalIF":7.8000,"publicationDate":"2025-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Psychological methods","FirstCategoryId":"102","ListUrlMain":"https://doi.org/10.1037/met0000597","RegionNum":1,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2023/7/27 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"PSYCHOLOGY, MULTIDISCIPLINARY","Score":null,"Total":0}

引用次数: 0

Abstract

Given recent evidence challenging the replicability of results in the social and behavioral sciences, critical questions have been raised about appropriate measures for determining replication success in comparing effect estimates across studies. At issue is the fact that conclusions about replication success often depend on the measure used for evaluating correspondence in results. Despite the importance of choosing an appropriate measure, there is still no widespread agreement about which measures should be used. This article addresses these questions by describing formally the most commonly used measures for assessing replication success, and by comparing their performance in different contexts according to their replication probabilities-that is, the probability of obtaining replication success given study-specific settings. The measures may be characterized broadly as conclusion-based approaches, which assess the congruence of two independent studies' conclusions about the presence of an effect, and distance-based approaches, which test for a significant difference or equivalence of two effect estimates. We also introduce a new measure for assessing replication success called the correspondence test, which combines a difference and equivalence test in the same framework. To help researchers plan prospective replication efforts, we provide closed formulas for power calculations that can be used to determine the minimum detectable effect size (and thus, sample sizes) for each study so that a predetermined minimum replication probability can be achieved. Finally, we use a replication data set from the Open Science Collaboration (2015) to demonstrate the extent to which conclusions about replication success depend on the correspondence measure selected. (PsycInfo Database Record (c) 2025 APA, all rights reserved).

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

用于评估复制成功的对应度量。

鉴于最近的证据对社会和行为科学结果的可复制性提出了挑战，在比较不同研究的效果估计时，确定复制成功的适当措施提出了关键问题。争论的焦点在于，关于复制成功与否的结论往往取决于用于评估结果一致性的测量方法。尽管选择适当的措施很重要，但对于应该使用哪些措施仍然没有广泛的共识。本文通过正式描述用于评估复制成功的最常用度量，并根据它们的复制概率（即给定特定研究设置的获得复制成功的概率）比较它们在不同上下文中的性能，来解决这些问题。这些措施可以被广泛地描述为基于结论的方法，评估两个独立研究关于效应存在的结论的一致性，以及基于距离的方法，测试两个效应估计的显着差异或等效性。我们还引入了一种评估复制成功的新方法，称为对应测试，它在同一框架中结合了差异测试和等效测试。为了帮助研究人员计划前瞻性的复制工作，我们提供了功率计算的封闭公式，可用于确定每个研究的最小可检测效应大小（从而确定样本量），从而可以实现预定的最小复制概率。最后，我们使用开放科学协作（2015）的复制数据集来证明关于复制成功的结论在多大程度上取决于所选择的对应度量。（PsycInfo Database Record (c) 2025 APA，版权所有）。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Psychological methods PSYCHOLOGY, MULTIDISCIPLINARY-

CiteScore

13.10

自引率

7.10%

发文量

159

期刊介绍： Psychological Methods is devoted to the development and dissemination of methods for collecting, analyzing, understanding, and interpreting psychological data. Its purpose is the dissemination of innovations in research design, measurement, methodology, and quantitative and qualitative analysis to the psychological community; its further purpose is to promote effective communication about related substantive and methodological issues. The audience is expected to be diverse and to include those who develop new procedures, those who are responsible for undergraduate and graduate training in design, measurement, and statistics, as well as those who employ those procedures in research.

期刊最新文献

A primer on equivalence (negligible effect) testing. Inaugural editorial. Using latent class analysis to justify a latent continuum in item development. Planned missingness to reduce survey length: A sheep in wolf’s clothing. Supplemental Material for Using Latent Class Analysis to Justify a Latent Continuum in Item Development