The meta-analysis reported in Vahey et al. (2015) concluded that the Implicit Relational Assessment Procedure (IRAP) has high clinical criterion validity (meta-analytic r‾ = .45) and therefore "the potential of the IRAP as a tool for clinical assessment" (p. 64). Vahey et al. (2015) also reported power analyses, and the article is frequently cited for sample size determination in IRAP studies, especially their heuristic of N > 37. This article attempts to verify those results. Results were found to have very poor reproducibility at almost every stage of the data extraction and analysis with errors generally biased towards inflating the effect size. The reported meta-analysis results were found to be mathematically implausible and could not be reproduced despite numerous attempts. Multiple internal discrepancies were found in the effect sizes such as between the forest plot and funnel plot, and between the forest plot and the supplementary data. 23 of the 56 (41.1%) individual effect sizes were not actually criterion effects and did not meet the original inclusion criteria. The original results were also undermined by combining effect sizes with different estimands. Reextraction of effect sizes from the original articles revealed 360 additional effect sizes that met inclusion criteria that should have been included in the original analysis. Examples of selection bias in the inclusion of larger effect sizes were observed. A new meta-analysis was calculated to understand the compound impact of these errors (i.e., without endorsing its results as a valid estimate of the IRAP's criterion validity). The effect size was half the size of the original (r‾ = .22), and the power analyses recommended sample sizes nearly 10 times larger than the original (N > 346), which no published original study using the IRAP has met. In aggregate, this seriously undermines the credibility and utility of the original article's conclusions and recommendations. Vahey et al. (2015) appears to need substantial correction at minimum. In particular, researchers should not rely on its results for sample size justification. A list of suggestions for error detection in meta-analyses is provided.
Hussey (in press) recently conducted a detailed critical reanalysis of Vahey, Nicholson and Barnes-Holmes' (2015) meta-analysis. Its stated purpose was to (a) examine the extent to which Vahey et al.'s (2015) meta-analysis contains errors; and (b) to test how computationally reproducible it is by current standards of best practice. Hussey identified a small number of minor numerical errors, but crucially was unable to exactly replicate the original meta-effect of r‾ = .45. Six different variations of the meta-analysis reported by Vahey et al. were used and obtained meta-effects that deviated from the original by Δr‾ = .01-.02. Hussey also reported corresponding 95% credibility intervals that were all of zero width. These discrepancies prompted the present authors to conduct a detailed audit of the original meta-analysis. This revealed one minor transposing error in addition to three identified by Hussey. Once corrected this resulted in a marginally increased Hunter and Schmidt meta-analytic effect of r‾ = .46 without a credibility interval, and a Hedges-Vevea meta-effect of r‾ = .47 with 95% confidence interval (.40, .54). This correction was too small to have any bearing on Vahey et al.'s supplementary analyses regarding publication bias or statistical power. Vahey et al. contained a much lower proportion of transposing errors than is typical of meta-analyses even still (cf. Kadlec, Sainani, & Nimphius, 2023; Lakens et al., 2016; Lakens et al., 2017). Nonetheless, Hussey highlighted important ambiguities about the theoretical and practical meaning of the meta-effect reported by Vahey et al. We clarify our position on these matters in summary, and in so doing explain why we believe that the wider IRAP literature would undoubtedly benefit from increased adoption of contemporary open science standards.