Effect of emphysema on AI software and human reader performance in lung nodule detection from low-dose chest CT.

IF 3.7 Q1 RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING European Radiology Experimental Pub Date : 2024-05-20 DOI:10.1186/s41747-024-00459-9

Nikos Sourlos, GertJan Pelgrim, Hendrik Joost Wisselink, Xiaofei Yang, Gonda de Jonge, Mieneke Rook, Mathias Prokop, Grigory Sidorenkov, Marcel van Tuinen, Rozemarijn Vliegenthart, Peter M A van Ooijen

{"title":"Effect of emphysema on AI software and human reader performance in lung nodule detection from low-dose chest CT.","authors":"Nikos Sourlos, GertJan Pelgrim, Hendrik Joost Wisselink, Xiaofei Yang, Gonda de Jonge, Mieneke Rook, Mathias Prokop, Grigory Sidorenkov, Marcel van Tuinen, Rozemarijn Vliegenthart, Peter M A van Ooijen","doi":"10.1186/s41747-024-00459-9","DOIUrl":null,"url":null,"abstract":"Background: Emphysema influences the appearance of lung tissue in computed tomography (CT). We evaluated whether this affects lung nodule detection by artificial intelligence (AI) and human readers (HR).Methods: Individuals were selected from the \"Lifelines\" cohort who had undergone low-dose chest CT. Nodules in individuals without emphysema were matched to similar-sized nodules in individuals with at least moderate emphysema. AI results for nodular findings of 30-100 mm3 and 101-300 mm3 were compared to those of HR; two expert radiologists blindly reviewed discrepancies. Sensitivity and false positives (FPs)/scan were compared for emphysema and non-emphysema groups.Results: Thirty-nine participants with and 82 without emphysema were included (n = 121, aged 61 ± 8 years (mean ± standard deviation), 58/121 males (47.9%)). AI and HR detected 196 and 206 nodular findings, respectively, yielding 109 concordant nodules and 184 discrepancies, including 118 true nodules. For AI, sensitivity was 0.68 (95% confidence interval 0.57-0.77) in emphysema versus 0.71 (0.62-0.78) in non-emphysema, with FPs/scan 0.51 and 0.22, respectively (p = 0.028). For HR, sensitivity was 0.76 (0.65-0.84) and 0.80 (0.72-0.86), with FPs/scan of 0.15 and 0.27 (p = 0.230). Overall sensitivity was slightly higher for HR than for AI, but this difference disappeared after the exclusion of benign lymph nodes. FPs/scan were higher for AI in emphysema than in non-emphysema (p = 0.028), while FPs/scan for HR were higher than AI for 30-100 mm3 nodules in non-emphysema (p = 0.009).Conclusions: AI resulted in more FPs/scan in emphysema compared to non-emphysema, a difference not observed for HR.Relevance statement: In the creation of a benchmark dataset to validate AI software for lung nodule detection, the inclusion of emphysema cases is important due to the additional number of FPs.Key points: • The sensitivity of nodule detection by AI was similar in emphysema and non-emphysema. • AI had more FPs/scan in emphysema compared to non-emphysema. • Sensitivity and FPs/scan by the human reader were comparable for emphysema and non-emphysema. • Emphysema and non-emphysema representation in benchmark dataset is important for validating AI.","PeriodicalId":36926,"journal":{"name":"European Radiology Experimental","volume":"8 1","pages":"63"},"PeriodicalIF":3.7000,"publicationDate":"2024-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11102890/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"European Radiology Experimental","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1186/s41747-024-00459-9","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING","Score":null,"Total":0}

引用次数: 0

Abstract

Background: Emphysema influences the appearance of lung tissue in computed tomography (CT). We evaluated whether this affects lung nodule detection by artificial intelligence (AI) and human readers (HR).

Methods: Individuals were selected from the "Lifelines" cohort who had undergone low-dose chest CT. Nodules in individuals without emphysema were matched to similar-sized nodules in individuals with at least moderate emphysema. AI results for nodular findings of 30-100 mm³ and 101-300 mm³ were compared to those of HR; two expert radiologists blindly reviewed discrepancies. Sensitivity and false positives (FPs)/scan were compared for emphysema and non-emphysema groups.

Results: Thirty-nine participants with and 82 without emphysema were included (n = 121, aged 61 ± 8 years (mean ± standard deviation), 58/121 males (47.9%)). AI and HR detected 196 and 206 nodular findings, respectively, yielding 109 concordant nodules and 184 discrepancies, including 118 true nodules. For AI, sensitivity was 0.68 (95% confidence interval 0.57-0.77) in emphysema versus 0.71 (0.62-0.78) in non-emphysema, with FPs/scan 0.51 and 0.22, respectively (p = 0.028). For HR, sensitivity was 0.76 (0.65-0.84) and 0.80 (0.72-0.86), with FPs/scan of 0.15 and 0.27 (p = 0.230). Overall sensitivity was slightly higher for HR than for AI, but this difference disappeared after the exclusion of benign lymph nodes. FPs/scan were higher for AI in emphysema than in non-emphysema (p = 0.028), while FPs/scan for HR were higher than AI for 30-100 mm³ nodules in non-emphysema (p = 0.009).

Conclusions: AI resulted in more FPs/scan in emphysema compared to non-emphysema, a difference not observed for HR.

Relevance statement: In the creation of a benchmark dataset to validate AI software for lung nodule detection, the inclusion of emphysema cases is important due to the additional number of FPs.

Key points: • The sensitivity of nodule detection by AI was similar in emphysema and non-emphysema. • AI had more FPs/scan in emphysema compared to non-emphysema. • Sensitivity and FPs/scan by the human reader were comparable for emphysema and non-emphysema. • Emphysema and non-emphysema representation in benchmark dataset is important for validating AI.

Abstract Image

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

肺气肿对人工智能软件和人类阅读器从低剂量胸部 CT 检测肺结节性能的影响

背景：肺气肿会影响计算机断层扫描（CT）中肺组织的外观。我们评估了这是否会影响人工智能（AI）和人类阅读器（HR）对肺结节的检测：方法：我们从 "生命线 "队列中选取了接受过低剂量胸部 CT 检查的人。未患肺气肿者的结节与至少患有中度肺气肿者的类似大小结节相匹配。将 30-100 立方毫米和 101-300 立方毫米结节的 AI 结果与 HR 结果进行比较；两名放射科专家对差异进行盲法复查。比较了肺气肿组和非肺气肿组的敏感性和假阳性（FPs）/扫描：39名患者有肺气肿，82名患者无肺气肿（n = 121，年龄为61 ± 8岁（平均 ± 标准差），男性58/121（47.9%））。人工智能和 HR 分别检测出 196 个和 206 个结节，其中 109 个结节一致，184 个不一致，包括 118 个真结节。对于 AI，肺气肿的灵敏度为 0.68（95% 置信区间为 0.57-0.77），而非肺气肿的灵敏度为 0.71（0.62-0.78），FPs/扫描分别为 0.51 和 0.22（P = 0.028）。对于 HR，灵敏度分别为 0.76（0.65-0.84）和 0.80（0.72-0.86），FPs/扫描分别为 0.15 和 0.27（p = 0.230）。HR的总体灵敏度略高于AI，但在排除良性淋巴结后，这种差异消失了。肺气肿患者的 AI FPs/scan 高于非肺气肿患者（p = 0.028），而非肺气肿患者 30-100 mm3 结节的 HR FPs/scan 高于 AI（p = 0.009）：结论：与非肺气肿相比，人工智能在肺气肿中的 FPs/scan 更高，但在 HR 中未观察到这一差异：在创建基准数据集以验证肺结节检测的人工智能软件时，纳入肺气肿病例非常重要，因为这将增加 FP 的数量：- 人工智能检测肺结节的灵敏度在肺气肿和非肺气肿中相似。- 与非肺气肿相比，人工智能在肺气肿中的 FPs/scan 更多。- 肺气肿和非肺气肿的灵敏度和人类阅读器的 FPs/scan 值相当。- 肺气肿和非肺气肿在基准数据集中的代表性对于验证人工智能非常重要。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊