The Bayes factor, HDI-ROPE, and frequentist equivalence tests can all be reverse engineered-Almost exactly-From one another: Reply to Linde et al. (2021).
{"title":"The Bayes factor, HDI-ROPE, and frequentist equivalence tests can all be reverse engineered-Almost exactly-From one another: Reply to Linde et al. (2021).","authors":"Harlan Campbell, Paul Gustafson","doi":"10.1037/met0000507","DOIUrl":null,"url":null,"abstract":"<p><p>Following an extensive simulation study comparing the operating characteristics of three different procedures used for establishing equivalence (the frequentist \"TOST,\" the Bayesian \"HDI-ROPE,\" and the Bayes factor interval null procedure), Linde et al. (2021) conclude with the recommendation that \"researchers rely more on the Bayes factor interval null approach for quantifying evidence for equivalence\" (p. 1). We redo the simulation study of Linde et al. (2021) in its entirety but with the different procedures calibrated to have the same predetermined maximum Type I error rate. Our results suggest that, when calibrated in this way, the Bayes factor, HDI-ROPE, and frequentist equivalence tests all have similar-almost exactly-Type II error rates. In general any advocating for frequentist testing as better or worse than Bayesian testing in terms of empirical findings seems dubious at best. If one decides on which underlying principle to subscribe to in tackling a given problem, then the method follows naturally. Bearing in mind that each procedure can be reverse-engineered from the others (at least approximately), trying to use empirical performance to argue for 1 approach over another seems like tilting at windmills. (PsycInfo Database Record (c) 2024 APA, all rights reserved).</p>","PeriodicalId":20782,"journal":{"name":"Psychological methods","volume":" ","pages":"613-623"},"PeriodicalIF":7.6000,"publicationDate":"2024-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Psychological methods","FirstCategoryId":"102","ListUrlMain":"https://doi.org/10.1037/met0000507","RegionNum":1,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/3/21 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"PSYCHOLOGY, MULTIDISCIPLINARY","Score":null,"Total":0}
引用次数: 0
Abstract
Following an extensive simulation study comparing the operating characteristics of three different procedures used for establishing equivalence (the frequentist "TOST," the Bayesian "HDI-ROPE," and the Bayes factor interval null procedure), Linde et al. (2021) conclude with the recommendation that "researchers rely more on the Bayes factor interval null approach for quantifying evidence for equivalence" (p. 1). We redo the simulation study of Linde et al. (2021) in its entirety but with the different procedures calibrated to have the same predetermined maximum Type I error rate. Our results suggest that, when calibrated in this way, the Bayes factor, HDI-ROPE, and frequentist equivalence tests all have similar-almost exactly-Type II error rates. In general any advocating for frequentist testing as better or worse than Bayesian testing in terms of empirical findings seems dubious at best. If one decides on which underlying principle to subscribe to in tackling a given problem, then the method follows naturally. Bearing in mind that each procedure can be reverse-engineered from the others (at least approximately), trying to use empirical performance to argue for 1 approach over another seems like tilting at windmills. (PsycInfo Database Record (c) 2024 APA, all rights reserved).
期刊介绍:
Psychological Methods is devoted to the development and dissemination of methods for collecting, analyzing, understanding, and interpreting psychological data. Its purpose is the dissemination of innovations in research design, measurement, methodology, and quantitative and qualitative analysis to the psychological community; its further purpose is to promote effective communication about related substantive and methodological issues. The audience is expected to be diverse and to include those who develop new procedures, those who are responsible for undergraduate and graduate training in design, measurement, and statistics, as well as those who employ those procedures in research.