Application of a methodological framework for the development and multicenter validation of reliable artificial intelligence in embryo evaluation.

IF 4.7 2区 医学 Q1 ENDOCRINOLOGY & METABOLISM Reproductive Biology and Endocrinology Pub Date : 2025-01-31 DOI:10.1186/s12958-025-01351-w
D Gilboa, Akhil Garg, M Shapiro, M Meseguer, Y Amar, N Lustgarten, N Desai, T Shavit, V Silva, A Papatheodorou, A Chatziparasidou, S Angras, J H Lee, L Thiel, C L Curchoe, Y Tauber, D S Seidman
{"title":"Application of a methodological framework for the development and multicenter validation of reliable artificial intelligence in embryo evaluation.","authors":"D Gilboa, Akhil Garg, M Shapiro, M Meseguer, Y Amar, N Lustgarten, N Desai, T Shavit, V Silva, A Papatheodorou, A Chatziparasidou, S Angras, J H Lee, L Thiel, C L Curchoe, Y Tauber, D S Seidman","doi":"10.1186/s12958-025-01351-w","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Artificial intelligence (AI) models analyzing embryo time-lapse images have been developed to predict the likelihood of pregnancy following in vitro fertilization (IVF). However, limited research exists on methods ensuring AI consistency and reliability in clinical settings during its development and validation process. We present a methodology for developing and validating an AI model across multiple datasets to demonstrate reliable performance in evaluating blastocyst-stage embryos.</p><p><strong>Methods: </strong>This multicenter analysis utilizes time-lapse images, pregnancy outcomes, and morphologic annotations from embryos collected at 10 IVF clinics across 9 countries between 2018 and 2022. The four-step methodology for developing and evaluating the AI model include: (I) curating annotated datasets that represent the intended clinical use case; (II) developing and optimizing the AI model; (III) evaluating the AI's performance by assessing its discriminative power and associations with pregnancy probability across variable data; and (IV) ensuring interpretability and explainability by correlating AI scores with relevant morphologic features of embryo quality. Three datasets were used: the training and validation dataset (n = 16,935 embryos), the blind test dataset (n = 1,708 embryos; 3 clinics), and the independent dataset (n = 7,445 embryos; 7 clinics) derived from previously unseen clinic cohorts.</p><p><strong>Results: </strong>The AI was designed as a deep learning classifier ranking embryos by score according to their likelihood of clinical pregnancy. Higher AI score brackets were associated with increased fetal heartbeat (FH) likelihood across all evaluated datasets, showing a trend of increasing odds ratios (OR). The highest OR was observed in the top G4 bracket (test dataset G4 score ≥ 7.5: OR 3.84; independent dataset G4 score ≥ 7.5: OR 4.01), while the lowest was in the G1 bracket (test dataset G1 score < 4.0: OR 0.40; independent dataset G1 score < 4.0: OR 0.45). AI score brackets G2, G3, and G4 displayed OR values above 1.0 (P < 0.05), indicating linear associations with FH likelihood. Average AI scores were consistently higher for FH-positive than for FH-negative embryos within each age subgroup. Positive correlations were also observed between AI scores and key morphologic parameters used to predict embryo quality.</p><p><strong>Conclusions: </strong>Strong AI performance across multiple datasets demonstrates the value of our four-step methodology in developing and validating the AI as a reliable adjunct to embryo evaluation.</p>","PeriodicalId":21011,"journal":{"name":"Reproductive Biology and Endocrinology","volume":"23 1","pages":"16"},"PeriodicalIF":4.7000,"publicationDate":"2025-01-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11783712/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Reproductive Biology and Endocrinology","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1186/s12958-025-01351-w","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENDOCRINOLOGY & METABOLISM","Score":null,"Total":0}
引用次数: 0

Abstract

Background: Artificial intelligence (AI) models analyzing embryo time-lapse images have been developed to predict the likelihood of pregnancy following in vitro fertilization (IVF). However, limited research exists on methods ensuring AI consistency and reliability in clinical settings during its development and validation process. We present a methodology for developing and validating an AI model across multiple datasets to demonstrate reliable performance in evaluating blastocyst-stage embryos.

Methods: This multicenter analysis utilizes time-lapse images, pregnancy outcomes, and morphologic annotations from embryos collected at 10 IVF clinics across 9 countries between 2018 and 2022. The four-step methodology for developing and evaluating the AI model include: (I) curating annotated datasets that represent the intended clinical use case; (II) developing and optimizing the AI model; (III) evaluating the AI's performance by assessing its discriminative power and associations with pregnancy probability across variable data; and (IV) ensuring interpretability and explainability by correlating AI scores with relevant morphologic features of embryo quality. Three datasets were used: the training and validation dataset (n = 16,935 embryos), the blind test dataset (n = 1,708 embryos; 3 clinics), and the independent dataset (n = 7,445 embryos; 7 clinics) derived from previously unseen clinic cohorts.

Results: The AI was designed as a deep learning classifier ranking embryos by score according to their likelihood of clinical pregnancy. Higher AI score brackets were associated with increased fetal heartbeat (FH) likelihood across all evaluated datasets, showing a trend of increasing odds ratios (OR). The highest OR was observed in the top G4 bracket (test dataset G4 score ≥ 7.5: OR 3.84; independent dataset G4 score ≥ 7.5: OR 4.01), while the lowest was in the G1 bracket (test dataset G1 score < 4.0: OR 0.40; independent dataset G1 score < 4.0: OR 0.45). AI score brackets G2, G3, and G4 displayed OR values above 1.0 (P < 0.05), indicating linear associations with FH likelihood. Average AI scores were consistently higher for FH-positive than for FH-negative embryos within each age subgroup. Positive correlations were also observed between AI scores and key morphologic parameters used to predict embryo quality.

Conclusions: Strong AI performance across multiple datasets demonstrates the value of our four-step methodology in developing and validating the AI as a reliable adjunct to embryo evaluation.

Abstract Image

Abstract Image

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
胚胎评估中可靠人工智能开发和多中心验证的方法学框架的应用。
背景:人工智能(AI)模型通过分析胚胎延时图像来预测体外受精(IVF)后怀孕的可能性。然而,在开发和验证过程中,关于如何确保人工智能在临床环境中的一致性和可靠性的研究有限。我们提出了一种跨多个数据集开发和验证人工智能模型的方法,以证明在评估囊胚期胚胎方面的可靠性能。方法:这项多中心分析利用了2018年至2022年间从9个国家的10家试管婴儿诊所收集的延时图像、妊娠结局和胚胎形态学注释。开发和评估人工智能模型的四步方法包括:(I)策划代表预期临床用例的注释数据集;(二)开发和优化人工智能模型;(III)通过评估人工智能在不同数据中的判别能力和与怀孕概率的关联来评估人工智能的性能;(四)通过将人工智能评分与胚胎质量的相关形态学特征相关联,确保可解释性和可解释性。使用了三个数据集:训练和验证数据集(n = 16,935个胚胎),盲测数据集(n = 1,708个胚胎;3个诊所)和独立数据集(n = 7,445个胚胎;(7个诊所)来自以前未见过的诊所队列。结果:人工智能被设计成一个深度学习分类器,根据胚胎临床妊娠的可能性对其进行评分。在所有评估的数据集中,较高的AI评分括号与胎儿心跳(FH)可能性增加相关,显示出优势比(OR)增加的趋势。在G4顶括号中观察到最高的OR(测试数据集G4评分≥7.5:OR 3.84;独立数据集G4评分≥7.5:OR 4.01),而最低的是G1(测试数据集G1评分)。结论:人工智能在多个数据集上的强大表现证明了我们的四步方法在开发和验证人工智能作为胚胎评估的可靠辅助方面的价值。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Reproductive Biology and Endocrinology
Reproductive Biology and Endocrinology 医学-内分泌学与代谢
CiteScore
7.90
自引率
2.30%
发文量
161
审稿时长
4-8 weeks
期刊介绍: Reproductive Biology and Endocrinology publishes and disseminates high-quality results from excellent research in the reproductive sciences. The journal publishes on topics covering gametogenesis, fertilization, early embryonic development, embryo-uterus interaction, reproductive development, pregnancy, uterine biology, endocrinology of reproduction, control of reproduction, reproductive immunology, neuroendocrinology, and veterinary and human reproductive medicine, including all vertebrate species.
期刊最新文献
The potential, perils and pitfalls of Artificial intelligence (AI) in Assisted Reproductive Technologies (ART). Phthalates as the silent saboteurs of male fertility via changes in semen quality: a systematic review. Unraveling the link between early sexual initiation and endometriosis: evidence from population-based analyses and genetic causal inference. Frozen-Thawed ovarian autografts treated with scaffold-based melatonin delivery in rats. In vitro differentiation of the hypothalamic KNDy neuron, a master regulator for reproduction, from mouse embryonic stem cells.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1