下载PDF
{"title":"要求分组:胸部射线照片分类器患者、环境和病理学分组中的通用差距。","authors":"Monish Ahluwalia, Mohamed Abdalla, James Sanayei, Laleh Seyyed-Kalantari, Mohannad Hussain, Amna Ali, Benjamin Fine","doi":"10.1148/ryai.220270","DOIUrl":null,"url":null,"abstract":"<p><strong>Purpose: </strong>To externally test four chest radiograph classifiers on a large, diverse, real-world dataset with robust subgroup analysis.</p><p><strong>Materials and methods: </strong>In this retrospective study, adult posteroanterior chest radiographs (January 2016-December 2020) and associated radiology reports from Trillium Health Partners in Ontario, Canada, were extracted and de-identified. An open-source natural language processing tool was locally validated and used to generate ground truth labels for the 197 540-image dataset based on the associated radiology report. Four classifiers generated predictions on each chest radiograph. Performance was evaluated using accuracy, positive predictive value, negative predictive value, sensitivity, specificity, F1 score, and Matthews correlation coefficient for the overall dataset and for patient, setting, and pathology subgroups.</p><p><strong>Results: </strong>Classifiers demonstrated 68%-77% accuracy, 64%-75% sensitivity, and 82%-94% specificity on the external testing dataset. Algorithms showed decreased sensitivity for solitary findings (43%-65%), patients younger than 40 years (27%-39%), and patients in the emergency department (38%-60%) and decreased specificity on normal chest radiographs with support devices (59%-85%). Differences in sex and ancestry represented movements along an algorithm's receiver operating characteristic curve.</p><p><strong>Conclusion: </strong>Performance of deep learning chest radiograph classifiers was subject to patient, setting, and pathology factors, demonstrating that subgroup analysis is necessary to inform implementation and monitor ongoing performance to ensure optimal quality, safety, and equity.<b>Keywords:</b> Conventional Radiography, Thorax, Ethics, Supervised Learning, Convolutional Neural Network (CNN), Machine Learning Algorithms <i>Supplemental material is available for this article.</i> © RSNA, 2023See also the commentary by Huisman and Hannink in this issue.</p>","PeriodicalId":29787,"journal":{"name":"Radiology-Artificial Intelligence","volume":null,"pages":null},"PeriodicalIF":8.1000,"publicationDate":"2023-07-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10546359/pdf/ryai.220270.pdf","citationCount":"4","resultStr":"{\"title\":\"The Subgroup Imperative: Chest Radiograph Classifier Generalization Gaps in Patient, Setting, and Pathology Subgroups.\",\"authors\":\"Monish Ahluwalia, Mohamed Abdalla, James Sanayei, Laleh Seyyed-Kalantari, Mohannad Hussain, Amna Ali, Benjamin Fine\",\"doi\":\"10.1148/ryai.220270\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Purpose: </strong>To externally test four chest radiograph classifiers on a large, diverse, real-world dataset with robust subgroup analysis.</p><p><strong>Materials and methods: </strong>In this retrospective study, adult posteroanterior chest radiographs (January 2016-December 2020) and associated radiology reports from Trillium Health Partners in Ontario, Canada, were extracted and de-identified. An open-source natural language processing tool was locally validated and used to generate ground truth labels for the 197 540-image dataset based on the associated radiology report. Four classifiers generated predictions on each chest radiograph. Performance was evaluated using accuracy, positive predictive value, negative predictive value, sensitivity, specificity, F1 score, and Matthews correlation coefficient for the overall dataset and for patient, setting, and pathology subgroups.</p><p><strong>Results: </strong>Classifiers demonstrated 68%-77% accuracy, 64%-75% sensitivity, and 82%-94% specificity on the external testing dataset. Algorithms showed decreased sensitivity for solitary findings (43%-65%), patients younger than 40 years (27%-39%), and patients in the emergency department (38%-60%) and decreased specificity on normal chest radiographs with support devices (59%-85%). Differences in sex and ancestry represented movements along an algorithm's receiver operating characteristic curve.</p><p><strong>Conclusion: </strong>Performance of deep learning chest radiograph classifiers was subject to patient, setting, and pathology factors, demonstrating that subgroup analysis is necessary to inform implementation and monitor ongoing performance to ensure optimal quality, safety, and equity.<b>Keywords:</b> Conventional Radiography, Thorax, Ethics, Supervised Learning, Convolutional Neural Network (CNN), Machine Learning Algorithms <i>Supplemental material is available for this article.</i> © RSNA, 2023See also the commentary by Huisman and Hannink in this issue.</p>\",\"PeriodicalId\":29787,\"journal\":{\"name\":\"Radiology-Artificial Intelligence\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":8.1000,\"publicationDate\":\"2023-07-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10546359/pdf/ryai.220270.pdf\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Radiology-Artificial Intelligence\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1148/ryai.220270\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2023/9/1 0:00:00\",\"PubModel\":\"eCollection\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Radiology-Artificial Intelligence","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1148/ryai.220270","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2023/9/1 0:00:00","PubModel":"eCollection","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 4
引用
批量引用
The Subgroup Imperative: Chest Radiograph Classifier Generalization Gaps in Patient, Setting, and Pathology Subgroups.
Purpose: To externally test four chest radiograph classifiers on a large, diverse, real-world dataset with robust subgroup analysis.
Materials and methods: In this retrospective study, adult posteroanterior chest radiographs (January 2016-December 2020) and associated radiology reports from Trillium Health Partners in Ontario, Canada, were extracted and de-identified. An open-source natural language processing tool was locally validated and used to generate ground truth labels for the 197 540-image dataset based on the associated radiology report. Four classifiers generated predictions on each chest radiograph. Performance was evaluated using accuracy, positive predictive value, negative predictive value, sensitivity, specificity, F1 score, and Matthews correlation coefficient for the overall dataset and for patient, setting, and pathology subgroups.
Results: Classifiers demonstrated 68%-77% accuracy, 64%-75% sensitivity, and 82%-94% specificity on the external testing dataset. Algorithms showed decreased sensitivity for solitary findings (43%-65%), patients younger than 40 years (27%-39%), and patients in the emergency department (38%-60%) and decreased specificity on normal chest radiographs with support devices (59%-85%). Differences in sex and ancestry represented movements along an algorithm's receiver operating characteristic curve.
Conclusion: Performance of deep learning chest radiograph classifiers was subject to patient, setting, and pathology factors, demonstrating that subgroup analysis is necessary to inform implementation and monitor ongoing performance to ensure optimal quality, safety, and equity.Keywords: Conventional Radiography, Thorax, Ethics, Supervised Learning, Convolutional Neural Network (CNN), Machine Learning Algorithms Supplemental material is available for this article. © RSNA, 2023See also the commentary by Huisman and Hannink in this issue.