A key aspect of scientific reliability includes replicability, that is, obtaining consistent results when an experiment is repeated. In embryo-fetal developmental toxicity (EFDT) studies, replicability can be assessed using in vitro models, targeted in vivo studies, and/or the second species study. This work assesses the replicability of whole-animal studies using historic rat data.
Data for two endpoints from five full studies were downloaded from the National Toxicology Program (NTP) website. Each full group was divided into two replicate sets (based on odd/even and top/bottom animal order) to evaluate within-study replicability. Analyses included summary statistics, scatter plots, a modified Levene's test for homogeneity of variances, and Cohen's d to assess effect sizes.
Replicate means deviated from the original study by only 0.4%–3.7% and differed by ≤ 7% between replicates (with differences < 5% in 87% of groups). Coefficients of variation (CV%) were generally consistent across subgroups, with few above 10%. Variance testing revealed significant differences in two of the five studies, and one study exhibited opposite fetal weight effects in the odd/even subgroup only. Evaluations of adjusted maternal weight gain were comparable across subgroups.
The observed 5%–7% differences between these idealized replicates may represent the lower bound for acceptable variability when merging replicate data sets. This work lays the groundwork for more robust evaluations of replicability in EFDT studies and may inform future regulatory guidance.