{"title":"南非肺结核患病率调查中从胸片中进行计算机辅助肺结核检测:外部验证和商用人工智能软件的模拟影响。","authors":"","doi":"10.1016/S2589-7500(24)00118-3","DOIUrl":null,"url":null,"abstract":"<div><h3>Background</h3><p>Computer-aided detection (CAD) can help identify people with active tuberculosis left undetected. However, few studies have compared the performance of commercially available CAD products for screening in high tuberculosis and high HIV settings, and there is poor understanding of threshold selection across products in different populations. We aimed to compare CAD products' performance, with further analyses on subgroup performance and threshold selection.</p></div><div><h3>Methods</h3><p>We evaluated 12 CAD products on a case–control sample of participants from a South African tuberculosis prevalence survey. Only those with microbiological test results were eligible. The primary outcome was comparing products' accuracy using the area under the receiver operating characteristic curve (AUC) against microbiological evidence. Threshold analyses were performed based on pre-defined criteria and across all thresholds. We conducted subgroup analyses including age, gender, HIV status, previous tuberculosis history, symptoms presence, and current smoking status.</p></div><div><h3>Findings</h3><p>Of the 774 people included, 516 were bacteriologically negative and 258 were bacteriologically positive. Diverse accuracy was noted: Lunit and Nexus had AUCs near 0·9, followed by qXR, JF CXR-2, InferRead, Xvision, and ChestEye (AUCs 0·8–0·9). XrayAME, RADIFY, and TiSepX-TB had AUC under 0·8. Thresholds varied notably across these products and different versions of the same products. Certain products (Lunit, Nexus, JF CXR-2, and qXR) maintained high sensitivity (>90%) across a wide threshold range while reducing the number of individuals requiring confirmatory diagnostic testing. All products generally performed worst in older individuals, people with previous tuberculosis, and people with HIV. Variations in thresholds, sensitivity, and specificity existed across groups and settings.</p></div><div><h3>Interpretation</h3><p>Several previously unevaluated products performed similarly to those evaluated by WHO. Thresholds differed across products and demographic subgroups. The rapid emergence of products and versions necessitates a global strategy to validate new versions and software to support CAD product and threshold selections.</p></div><div><h3>Funding</h3><p>Government of Canada.</p></div>","PeriodicalId":48534,"journal":{"name":"Lancet Digital Health","volume":null,"pages":null},"PeriodicalIF":23.8000,"publicationDate":"2024-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2589750024001183/pdfft?md5=775b5ed834f92e1ac27b79991982d09e&pid=1-s2.0-S2589750024001183-main.pdf","citationCount":"0","resultStr":"{\"title\":\"Computer-aided detection of tuberculosis from chest radiographs in a tuberculosis prevalence survey in South Africa: external validation and modelled impacts of commercially available artificial intelligence software\",\"authors\":\"\",\"doi\":\"10.1016/S2589-7500(24)00118-3\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><h3>Background</h3><p>Computer-aided detection (CAD) can help identify people with active tuberculosis left undetected. However, few studies have compared the performance of commercially available CAD products for screening in high tuberculosis and high HIV settings, and there is poor understanding of threshold selection across products in different populations. We aimed to compare CAD products' performance, with further analyses on subgroup performance and threshold selection.</p></div><div><h3>Methods</h3><p>We evaluated 12 CAD products on a case–control sample of participants from a South African tuberculosis prevalence survey. Only those with microbiological test results were eligible. The primary outcome was comparing products' accuracy using the area under the receiver operating characteristic curve (AUC) against microbiological evidence. Threshold analyses were performed based on pre-defined criteria and across all thresholds. We conducted subgroup analyses including age, gender, HIV status, previous tuberculosis history, symptoms presence, and current smoking status.</p></div><div><h3>Findings</h3><p>Of the 774 people included, 516 were bacteriologically negative and 258 were bacteriologically positive. Diverse accuracy was noted: Lunit and Nexus had AUCs near 0·9, followed by qXR, JF CXR-2, InferRead, Xvision, and ChestEye (AUCs 0·8–0·9). XrayAME, RADIFY, and TiSepX-TB had AUC under 0·8. Thresholds varied notably across these products and different versions of the same products. Certain products (Lunit, Nexus, JF CXR-2, and qXR) maintained high sensitivity (>90%) across a wide threshold range while reducing the number of individuals requiring confirmatory diagnostic testing. All products generally performed worst in older individuals, people with previous tuberculosis, and people with HIV. Variations in thresholds, sensitivity, and specificity existed across groups and settings.</p></div><div><h3>Interpretation</h3><p>Several previously unevaluated products performed similarly to those evaluated by WHO. Thresholds differed across products and demographic subgroups. The rapid emergence of products and versions necessitates a global strategy to validate new versions and software to support CAD product and threshold selections.</p></div><div><h3>Funding</h3><p>Government of Canada.</p></div>\",\"PeriodicalId\":48534,\"journal\":{\"name\":\"Lancet Digital Health\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":23.8000,\"publicationDate\":\"2024-07-19\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.sciencedirect.com/science/article/pii/S2589750024001183/pdfft?md5=775b5ed834f92e1ac27b79991982d09e&pid=1-s2.0-S2589750024001183-main.pdf\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Lancet Digital Health\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2589750024001183\",\"RegionNum\":1,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"MEDICAL INFORMATICS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Lancet Digital Health","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2589750024001183","RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MEDICAL INFORMATICS","Score":null,"Total":0}
Computer-aided detection of tuberculosis from chest radiographs in a tuberculosis prevalence survey in South Africa: external validation and modelled impacts of commercially available artificial intelligence software
Background
Computer-aided detection (CAD) can help identify people with active tuberculosis left undetected. However, few studies have compared the performance of commercially available CAD products for screening in high tuberculosis and high HIV settings, and there is poor understanding of threshold selection across products in different populations. We aimed to compare CAD products' performance, with further analyses on subgroup performance and threshold selection.
Methods
We evaluated 12 CAD products on a case–control sample of participants from a South African tuberculosis prevalence survey. Only those with microbiological test results were eligible. The primary outcome was comparing products' accuracy using the area under the receiver operating characteristic curve (AUC) against microbiological evidence. Threshold analyses were performed based on pre-defined criteria and across all thresholds. We conducted subgroup analyses including age, gender, HIV status, previous tuberculosis history, symptoms presence, and current smoking status.
Findings
Of the 774 people included, 516 were bacteriologically negative and 258 were bacteriologically positive. Diverse accuracy was noted: Lunit and Nexus had AUCs near 0·9, followed by qXR, JF CXR-2, InferRead, Xvision, and ChestEye (AUCs 0·8–0·9). XrayAME, RADIFY, and TiSepX-TB had AUC under 0·8. Thresholds varied notably across these products and different versions of the same products. Certain products (Lunit, Nexus, JF CXR-2, and qXR) maintained high sensitivity (>90%) across a wide threshold range while reducing the number of individuals requiring confirmatory diagnostic testing. All products generally performed worst in older individuals, people with previous tuberculosis, and people with HIV. Variations in thresholds, sensitivity, and specificity existed across groups and settings.
Interpretation
Several previously unevaluated products performed similarly to those evaluated by WHO. Thresholds differed across products and demographic subgroups. The rapid emergence of products and versions necessitates a global strategy to validate new versions and software to support CAD product and threshold selections.
期刊介绍:
The Lancet Digital Health publishes important, innovative, and practice-changing research on any topic connected with digital technology in clinical medicine, public health, and global health.
The journal’s open access content crosses subject boundaries, building bridges between health professionals and researchers.By bringing together the most important advances in this multidisciplinary field,The Lancet Digital Health is the most prominent publishing venue in digital health.
We publish a range of content types including Articles,Review, Comment, and Correspondence, contributing to promoting digital technologies in health practice worldwide.