Objective: To externally validate a fully automated embryo classification in in vitro fertilization (IVF) treatments.
Design: Retrospective cohort study SUBJECTS: A total of 6,434 patients undergoing 7,352 IVF treatments contributed 70,456 embryos.
Exposure: Embryos were evaluated by conventional morphology and retrospectively scored using a fully automated deep learning-based algorithm across conventional IVF, oocyte donation, and PGT-A cycles.
Main outcome measures: The primary outcomes were implantation and live birth including odds ratios (ORs) from generalized estimating equation (GEE) models. Secondary outcomes were embryo morphology, euploidy and miscarriage. Exploratory outcomes included comparison between conventional methodology and artificial intelligence (AI) algorithm with areas under the ROC curves (AUCs), agreement degree between AI and embryologists, Cohen's Kappa coefficient and relative risk (RR).
Results: Implantation and live birth rates increased as the automatic embryo score rose. The GEE model, controlling for confounders, showed the automatic score was associated with an OR of 1.31 (95%CI[1.25-1.36]) for implantation in treatments using oocytes from patients, and an OR of 1.17 (95%CI[1.14-1.20]) in the oocyte donation program, with no significant association in PGT-A treatments. For live birth, the ORs were 1.27 (95%CI[1.21-1.33]) for patients, 1.16 (95%CI[1.13-1.19]) for donors, and 1.05 (95%CI[1-1.10]) for PGT-A cycles. The average score was higher in embryos with better morphology, in euploid embryos compared to aneuploid embryos, and in embryos that resulted in a full-term pregnancy compared to those that miscarried. Concordance between the highest-scoring embryo and the embryo with the best conventional morphology was 71.4%(95%CI[67.7%-75.0%]) in treatments with patient oocytes and 61.0%(95%CI[58.6%-63.4%]) in the oocyte donation program. Overall, the Cohen's Kappa coefficient was 0.63. The automatic embryo score showed similar AUCs to conventional morphology, although implantation was higher when the transferred embryo matched the highest-scoring embryo from each cohort (57.36% vs. 49.98%). RR indicated a 1.14-fold increase in implantation likelihood when the top-ranked embryo was transferred.
Conclusion: Fully automated embryo scoring effectively ranked embryos based on their potential for implantation and live birth. The performance of the conventional methodology was comparable to that of the artificial intelligence-based technology; however, better clinical outcomes were observed when the highest-scoring embryo in the cohort was transferred.