Assessing generalizability of an AI-based visual test for cervical cancer screening.

PLOS digital health Pub Date : 2024-10-02 eCollection Date: 2024-10-01 DOI:10.1371/journal.pdig.0000364
Syed Rakin Ahmed, Didem Egemen, Brian Befano, Ana Cecilia Rodriguez, Jose Jeronimo, Kanan Desai, Carolina Teran, Karla Alfaro, Joel Fokom-Domgue, Kittipat Charoenkwan, Chemtai Mungo, Rebecca Luckett, Rakiya Saidu, Taina Raiol, Ana Ribeiro, Julia C Gage, Silvia de Sanjose, Jayashree Kalpathy-Cramer, Mark Schiffman
{"title":"Assessing generalizability of an AI-based visual test for cervical cancer screening.","authors":"Syed Rakin Ahmed, Didem Egemen, Brian Befano, Ana Cecilia Rodriguez, Jose Jeronimo, Kanan Desai, Carolina Teran, Karla Alfaro, Joel Fokom-Domgue, Kittipat Charoenkwan, Chemtai Mungo, Rebecca Luckett, Rakiya Saidu, Taina Raiol, Ana Ribeiro, Julia C Gage, Silvia de Sanjose, Jayashree Kalpathy-Cramer, Mark Schiffman","doi":"10.1371/journal.pdig.0000364","DOIUrl":null,"url":null,"abstract":"<p><p>A number of challenges hinder artificial intelligence (AI) models from effective clinical translation. Foremost among these challenges is the lack of generalizability, which is defined as the ability of a model to perform well on datasets that have different characteristics from the training data. We recently investigated the development of an AI pipeline on digital images of the cervix, utilizing a multi-heterogeneous dataset of 9,462 women (17,013 images) and a multi-stage model selection and optimization approach, to generate a diagnostic classifier able to classify images of the cervix into \"normal\", \"indeterminate\" and \"precancer/cancer\" (denoted as \"precancer+\") categories. In this work, we investigate the performance of this multiclass classifier on external data not utilized in training and internal validation, to assess the generalizability of the classifier when moving to new settings. We assessed both the classification performance and repeatability of our classifier model across the two axes of heterogeneity present in our dataset: image capture device and geography, utilizing both out-of-the-box inference and retraining with external data. Our results demonstrate that device-level heterogeneity affects our model performance more than geography-level heterogeneity. Classification performance of our model is strong on images from a new geography without retraining, while incremental retraining with inclusion of images from a new device progressively improves classification performance on that device up to a point of saturation. Repeatability of our model is relatively unaffected by data heterogeneity and remains strong throughout. Our work supports the need for optimized retraining approaches that address data heterogeneity (e.g., when moving to a new device) to facilitate effective use of AI models in new settings.</p>","PeriodicalId":74465,"journal":{"name":"PLOS digital health","volume":"3 10","pages":"e0000364"},"PeriodicalIF":0.0000,"publicationDate":"2024-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11446437/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"PLOS digital health","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1371/journal.pdig.0000364","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/10/1 0:00:00","PubModel":"eCollection","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

A number of challenges hinder artificial intelligence (AI) models from effective clinical translation. Foremost among these challenges is the lack of generalizability, which is defined as the ability of a model to perform well on datasets that have different characteristics from the training data. We recently investigated the development of an AI pipeline on digital images of the cervix, utilizing a multi-heterogeneous dataset of 9,462 women (17,013 images) and a multi-stage model selection and optimization approach, to generate a diagnostic classifier able to classify images of the cervix into "normal", "indeterminate" and "precancer/cancer" (denoted as "precancer+") categories. In this work, we investigate the performance of this multiclass classifier on external data not utilized in training and internal validation, to assess the generalizability of the classifier when moving to new settings. We assessed both the classification performance and repeatability of our classifier model across the two axes of heterogeneity present in our dataset: image capture device and geography, utilizing both out-of-the-box inference and retraining with external data. Our results demonstrate that device-level heterogeneity affects our model performance more than geography-level heterogeneity. Classification performance of our model is strong on images from a new geography without retraining, while incremental retraining with inclusion of images from a new device progressively improves classification performance on that device up to a point of saturation. Repeatability of our model is relatively unaffected by data heterogeneity and remains strong throughout. Our work supports the need for optimized retraining approaches that address data heterogeneity (e.g., when moving to a new device) to facilitate effective use of AI models in new settings.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
评估基于人工智能的宫颈癌筛查视觉测试的通用性。
许多挑战阻碍了人工智能(AI)模型有效地进行临床转化。其中最主要的挑战是缺乏通用性,通用性是指模型在与训练数据具有不同特征的数据集上表现良好的能力。我们最近研究了在宫颈数字图像上开发人工智能流水线的问题,利用由 9,462 名妇女(17,013 张图像)组成的多异构数据集和多阶段模型选择与优化方法,生成了一个诊断分类器,能够将宫颈图像分为 "正常"、"不确定 "和 "癌前/癌"(表示为 "癌前+")类别。在这项工作中,我们研究了这一多类分类器在未用于训练和内部验证的外部数据上的性能,以评估分类器在转移到新环境时的通用性。我们利用开箱即用的推理和外部数据的再训练,评估了分类器模型在数据集的两个异质性轴(图像捕捉设备和地理位置)上的分类性能和可重复性。我们的结果表明,设备层面的异质性对模型性能的影响要大于地理层面的异质性。在不进行再训练的情况下,我们的模型对来自新地理位置的图像的分类性能很强,而通过加入来自新设备的图像进行增量再训练,可以逐步提高该设备的分类性能,直至达到饱和点。我们的模型的可重复性相对来说不受数据异质性的影响,在整个过程中保持强劲。我们的工作支持了对优化的再训练方法的需求,这种方法可以解决数据异质性问题(例如,在转移到新设备时),从而促进人工智能模型在新环境中的有效使用。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Opportunities to design better computer vison-assisted food diaries to support individuals and experts in dietary assessment: An observation and interview study with nutrition experts. Deep learning-based screening for locomotive syndrome using single-camera walking video: Development and validation study. A recurrent neural network and parallel hidden Markov model algorithm to segment and detect heart murmurs in phonocardiograms. On-site electronic consent in pediatrics using generic Informed Consent Service (gICS): Creating a specialized setup and collecting consent data. A feature-based qualitative assessment of smoking cessation mobile applications.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1