Nikos Sourlos, GertJan Pelgrim, Hendrik Joost Wisselink, Xiaofei Yang, Gonda de Jonge, Mieneke Rook, Mathias Prokop, Grigory Sidorenkov, Marcel van Tuinen, Rozemarijn Vliegenthart, Peter M A van Ooijen
{"title":"Effect of emphysema on AI software and human reader performance in lung nodule detection from low-dose chest CT.","authors":"Nikos Sourlos, GertJan Pelgrim, Hendrik Joost Wisselink, Xiaofei Yang, Gonda de Jonge, Mieneke Rook, Mathias Prokop, Grigory Sidorenkov, Marcel van Tuinen, Rozemarijn Vliegenthart, Peter M A van Ooijen","doi":"10.1186/s41747-024-00459-9","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Emphysema influences the appearance of lung tissue in computed tomography (CT). We evaluated whether this affects lung nodule detection by artificial intelligence (AI) and human readers (HR).</p><p><strong>Methods: </strong>Individuals were selected from the \"Lifelines\" cohort who had undergone low-dose chest CT. Nodules in individuals without emphysema were matched to similar-sized nodules in individuals with at least moderate emphysema. AI results for nodular findings of 30-100 mm<sup>3</sup> and 101-300 mm<sup>3</sup> were compared to those of HR; two expert radiologists blindly reviewed discrepancies. Sensitivity and false positives (FPs)/scan were compared for emphysema and non-emphysema groups.</p><p><strong>Results: </strong>Thirty-nine participants with and 82 without emphysema were included (n = 121, aged 61 ± 8 years (mean ± standard deviation), 58/121 males (47.9%)). AI and HR detected 196 and 206 nodular findings, respectively, yielding 109 concordant nodules and 184 discrepancies, including 118 true nodules. For AI, sensitivity was 0.68 (95% confidence interval 0.57-0.77) in emphysema versus 0.71 (0.62-0.78) in non-emphysema, with FPs/scan 0.51 and 0.22, respectively (p = 0.028). For HR, sensitivity was 0.76 (0.65-0.84) and 0.80 (0.72-0.86), with FPs/scan of 0.15 and 0.27 (p = 0.230). Overall sensitivity was slightly higher for HR than for AI, but this difference disappeared after the exclusion of benign lymph nodes. FPs/scan were higher for AI in emphysema than in non-emphysema (p = 0.028), while FPs/scan for HR were higher than AI for 30-100 mm<sup>3</sup> nodules in non-emphysema (p = 0.009).</p><p><strong>Conclusions: </strong>AI resulted in more FPs/scan in emphysema compared to non-emphysema, a difference not observed for HR.</p><p><strong>Relevance statement: </strong>In the creation of a benchmark dataset to validate AI software for lung nodule detection, the inclusion of emphysema cases is important due to the additional number of FPs.</p><p><strong>Key points: </strong>• The sensitivity of nodule detection by AI was similar in emphysema and non-emphysema. • AI had more FPs/scan in emphysema compared to non-emphysema. • Sensitivity and FPs/scan by the human reader were comparable for emphysema and non-emphysema. • Emphysema and non-emphysema representation in benchmark dataset is important for validating AI.</p>","PeriodicalId":36926,"journal":{"name":"European Radiology Experimental","volume":null,"pages":null},"PeriodicalIF":3.7000,"publicationDate":"2024-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11102890/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"European Radiology Experimental","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1186/s41747-024-00459-9","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING","Score":null,"Total":0}
引用次数: 0
Abstract
Background: Emphysema influences the appearance of lung tissue in computed tomography (CT). We evaluated whether this affects lung nodule detection by artificial intelligence (AI) and human readers (HR).
Methods: Individuals were selected from the "Lifelines" cohort who had undergone low-dose chest CT. Nodules in individuals without emphysema were matched to similar-sized nodules in individuals with at least moderate emphysema. AI results for nodular findings of 30-100 mm3 and 101-300 mm3 were compared to those of HR; two expert radiologists blindly reviewed discrepancies. Sensitivity and false positives (FPs)/scan were compared for emphysema and non-emphysema groups.
Results: Thirty-nine participants with and 82 without emphysema were included (n = 121, aged 61 ± 8 years (mean ± standard deviation), 58/121 males (47.9%)). AI and HR detected 196 and 206 nodular findings, respectively, yielding 109 concordant nodules and 184 discrepancies, including 118 true nodules. For AI, sensitivity was 0.68 (95% confidence interval 0.57-0.77) in emphysema versus 0.71 (0.62-0.78) in non-emphysema, with FPs/scan 0.51 and 0.22, respectively (p = 0.028). For HR, sensitivity was 0.76 (0.65-0.84) and 0.80 (0.72-0.86), with FPs/scan of 0.15 and 0.27 (p = 0.230). Overall sensitivity was slightly higher for HR than for AI, but this difference disappeared after the exclusion of benign lymph nodes. FPs/scan were higher for AI in emphysema than in non-emphysema (p = 0.028), while FPs/scan for HR were higher than AI for 30-100 mm3 nodules in non-emphysema (p = 0.009).
Conclusions: AI resulted in more FPs/scan in emphysema compared to non-emphysema, a difference not observed for HR.
Relevance statement: In the creation of a benchmark dataset to validate AI software for lung nodule detection, the inclusion of emphysema cases is important due to the additional number of FPs.
Key points: • The sensitivity of nodule detection by AI was similar in emphysema and non-emphysema. • AI had more FPs/scan in emphysema compared to non-emphysema. • Sensitivity and FPs/scan by the human reader were comparable for emphysema and non-emphysema. • Emphysema and non-emphysema representation in benchmark dataset is important for validating AI.