The Subgroup Imperative: Chest Radiograph Classifier Generalization Gaps in Patient, Setting, and Pathology Subgroups.

IF 8.1 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Radiology-Artificial Intelligence Pub Date : 2023-07-12 eCollection Date: 2023-09-01 DOI:10.1148/ryai.220270
Monish Ahluwalia, Mohamed Abdalla, James Sanayei, Laleh Seyyed-Kalantari, Mohannad Hussain, Amna Ali, Benjamin Fine
{"title":"The Subgroup Imperative: Chest Radiograph Classifier Generalization Gaps in Patient, Setting, and Pathology Subgroups.","authors":"Monish Ahluwalia,&nbsp;Mohamed Abdalla,&nbsp;James Sanayei,&nbsp;Laleh Seyyed-Kalantari,&nbsp;Mohannad Hussain,&nbsp;Amna Ali,&nbsp;Benjamin Fine","doi":"10.1148/ryai.220270","DOIUrl":null,"url":null,"abstract":"<p><strong>Purpose: </strong>To externally test four chest radiograph classifiers on a large, diverse, real-world dataset with robust subgroup analysis.</p><p><strong>Materials and methods: </strong>In this retrospective study, adult posteroanterior chest radiographs (January 2016-December 2020) and associated radiology reports from Trillium Health Partners in Ontario, Canada, were extracted and de-identified. An open-source natural language processing tool was locally validated and used to generate ground truth labels for the 197 540-image dataset based on the associated radiology report. Four classifiers generated predictions on each chest radiograph. Performance was evaluated using accuracy, positive predictive value, negative predictive value, sensitivity, specificity, F1 score, and Matthews correlation coefficient for the overall dataset and for patient, setting, and pathology subgroups.</p><p><strong>Results: </strong>Classifiers demonstrated 68%-77% accuracy, 64%-75% sensitivity, and 82%-94% specificity on the external testing dataset. Algorithms showed decreased sensitivity for solitary findings (43%-65%), patients younger than 40 years (27%-39%), and patients in the emergency department (38%-60%) and decreased specificity on normal chest radiographs with support devices (59%-85%). Differences in sex and ancestry represented movements along an algorithm's receiver operating characteristic curve.</p><p><strong>Conclusion: </strong>Performance of deep learning chest radiograph classifiers was subject to patient, setting, and pathology factors, demonstrating that subgroup analysis is necessary to inform implementation and monitor ongoing performance to ensure optimal quality, safety, and equity.<b>Keywords:</b> Conventional Radiography, Thorax, Ethics, Supervised Learning, Convolutional Neural Network (CNN), Machine Learning Algorithms <i>Supplemental material is available for this article.</i> © RSNA, 2023See also the commentary by Huisman and Hannink in this issue.</p>","PeriodicalId":29787,"journal":{"name":"Radiology-Artificial Intelligence","volume":null,"pages":null},"PeriodicalIF":8.1000,"publicationDate":"2023-07-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10546359/pdf/ryai.220270.pdf","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Radiology-Artificial Intelligence","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1148/ryai.220270","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2023/9/1 0:00:00","PubModel":"eCollection","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 4

Abstract

Purpose: To externally test four chest radiograph classifiers on a large, diverse, real-world dataset with robust subgroup analysis.

Materials and methods: In this retrospective study, adult posteroanterior chest radiographs (January 2016-December 2020) and associated radiology reports from Trillium Health Partners in Ontario, Canada, were extracted and de-identified. An open-source natural language processing tool was locally validated and used to generate ground truth labels for the 197 540-image dataset based on the associated radiology report. Four classifiers generated predictions on each chest radiograph. Performance was evaluated using accuracy, positive predictive value, negative predictive value, sensitivity, specificity, F1 score, and Matthews correlation coefficient for the overall dataset and for patient, setting, and pathology subgroups.

Results: Classifiers demonstrated 68%-77% accuracy, 64%-75% sensitivity, and 82%-94% specificity on the external testing dataset. Algorithms showed decreased sensitivity for solitary findings (43%-65%), patients younger than 40 years (27%-39%), and patients in the emergency department (38%-60%) and decreased specificity on normal chest radiographs with support devices (59%-85%). Differences in sex and ancestry represented movements along an algorithm's receiver operating characteristic curve.

Conclusion: Performance of deep learning chest radiograph classifiers was subject to patient, setting, and pathology factors, demonstrating that subgroup analysis is necessary to inform implementation and monitor ongoing performance to ensure optimal quality, safety, and equity.Keywords: Conventional Radiography, Thorax, Ethics, Supervised Learning, Convolutional Neural Network (CNN), Machine Learning Algorithms Supplemental material is available for this article. © RSNA, 2023See also the commentary by Huisman and Hannink in this issue.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
要求分组:胸部射线照片分类器患者、环境和病理学分组中的通用差距。
目的:通过稳健的亚组分析,在一个庞大、多样化的真实世界数据集上对四个胸部X光片分类器进行外部测试。材料和方法:在这项回顾性研究中,提取并鉴定了加拿大安大略省Trillium Health Partners的成人后前胸部x线片(2016年1月至2020年12月)和相关放射学报告。一个开源的自然语言处理工具经过了本地验证,并用于为197 540基于相关联的放射学报告的图像数据集。四个分类器对每张胸部射线照片生成预测。使用整体数据集以及患者、环境和病理亚组的准确性、阳性预测值、阴性预测值、敏感性、特异性、F1评分和Matthews相关系数来评估绩效。结果:分类器在外部测试数据集上显示出68%-77%的准确性、64%-75%的敏感性和82%-94%的特异性。算法显示,对孤立发现(43%-65%)、40岁以下患者(27%-39%)和急诊科患者(38%-60%)的敏感性降低,对带支持设备的正常胸部X线片的特异性降低(59%-85%)。性别和祖先的差异代表了沿着算法的接收器操作特征曲线的运动。结论:深度学习胸部X线片分类器的性能受患者、环境和病理因素的影响,表明亚组分析是必要的,以告知实施和监测正在进行的性能,以确保最佳的质量、安全性和公平性。关键词:常规放射照相术,胸部,伦理学,监督学习,卷积神经网络(CNN),机器学习算法。本文提供了补充材料。©RSNA,2023另请参阅Huisman和Hannink在本期的评论。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
CiteScore
16.20
自引率
1.00%
发文量
0
期刊介绍: Radiology: Artificial Intelligence is a bi-monthly publication that focuses on the emerging applications of machine learning and artificial intelligence in the field of imaging across various disciplines. This journal is available online and accepts multiple manuscript types, including Original Research, Technical Developments, Data Resources, Review articles, Editorials, Letters to the Editor and Replies, Special Reports, and AI in Brief.
期刊最新文献
AI-integrated Screening to Replace Double Reading of Mammograms: A Population-wide Accuracy and Feasibility Study. Deep Learning Segmentation of Ascites on Abdominal CT Scans for Automatic Volume Quantification. Presurgical Upgrade Prediction of DCIS to Invasive Ductal Carcinoma Using Time-dependent Deep Learning Models with DCE MRI. Artificial Intelligence Outcome Prediction in Neonates with Encephalopathy (AI-OPiNE). Deep Learning to Detect Intracranial Hemorrhage in a National Teleradiology Program and the Impact on Interpretation Time.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1