深度学习算法用于英国筛查队列中的乳腺癌检测:作为独立阅读器和与人工阅读器相结合。

IF 12.1 1区 医学 Q1 RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING Radiology Pub Date : 2024-11-01 DOI:10.1148/radiol.233147
Sarah E Hickman, Nicholas R Payne, Richard T Black, Yuan Huang, Andrew N Priest, Sue Hudson, Bahman Kasmai, Arne Juette, Muzna Nanaa, Fiona J Gilbert
{"title":"深度学习算法用于英国筛查队列中的乳腺癌检测:作为独立阅读器和与人工阅读器相结合。","authors":"Sarah E Hickman, Nicholas R Payne, Richard T Black, Yuan Huang, Andrew N Priest, Sue Hudson, Bahman Kasmai, Arne Juette, Muzna Nanaa, Fiona J Gilbert","doi":"10.1148/radiol.233147","DOIUrl":null,"url":null,"abstract":"<p><p>Background Deep learning (DL) algorithms have shown promising results in mammographic screening either compared to a single reader or, when deployed in conjunction with a human reader, compared with double reading. Purpose To externally validate the performance of three DL algorithms as mammographic screen readers in an independent UK data set. Materials and Methods Three commercial DL algorithms (DL-1, DL-2, and DL-3) were retrospectively investigated from January 2022 to June 2022 using consecutive full-field digital mammograms collected at two UK sites during 1 year (2017). Normal cases with 3-year follow-up and histopathologically proven cancer cases detected either at screening (that round or next) or within the 3-year interval were included. A preset specificity threshold equivalent to a single reader was applied. Performance was evaluated for stand-alone DL reading compared with single human reading, and for DL reading combined with human reading compared with double reading, using sensitivity and specificity as the primary metrics. <i>P</i> < .025 was considered to indicate statistical significance for noninferiority testing. Results A total of 26 722 cases (median patient age, 59.0 years [IQR, 54.0-63.0 years]) with mammograms acquired using machines from two vendors were included. Cases included 332 screen-detected, 174 interval, and 254 next-round cancers. Two of three stand-alone DL algorithms achieved noninferior sensitivity (DL-1: 64.8%, <i>P</i> < .001; DL-2: 56.7%, <i>P</i> = .03; DL-3: 58.9%, <i>P</i> < .001) compared with the single first reader (62.8%), and specificity was noninferior for DL-1 (92.8%; <i>P</i> < .001) and DL-2 (96.8%; <i>P</i> < .001) and superior for DL-3 (97.9%; <i>P</i> < .001) compared with the single first reader (96.5%). Combining the DL algorithms with human readers achieved noninferior sensitivity (67.0%, 65.6%, and 65.4% for DL-1, DL-2, and DL-3, respectively; <i>P</i> < .001 for all) compared with double reading (67.4%), and superior specificity (97.4%, 97.6%, and 97.6%; <i>P</i> < .001 for all) compared with double reading (97.1%). Conclusion Use of stand-alone DL algorithms in combination with a human reader could maintain screening accuracy while reducing workload. Published under a CC BY 4.0 license. <i>Supplemental material is available for this article.</i></p>","PeriodicalId":20896,"journal":{"name":"Radiology","volume":"313 2","pages":"e233147"},"PeriodicalIF":12.1000,"publicationDate":"2024-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Deep Learning Algorithms for Breast Cancer Detection in a UK Screening Cohort: As Stand-alone Readers and Combined with Human Readers.\",\"authors\":\"Sarah E Hickman, Nicholas R Payne, Richard T Black, Yuan Huang, Andrew N Priest, Sue Hudson, Bahman Kasmai, Arne Juette, Muzna Nanaa, Fiona J Gilbert\",\"doi\":\"10.1148/radiol.233147\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>Background Deep learning (DL) algorithms have shown promising results in mammographic screening either compared to a single reader or, when deployed in conjunction with a human reader, compared with double reading. Purpose To externally validate the performance of three DL algorithms as mammographic screen readers in an independent UK data set. Materials and Methods Three commercial DL algorithms (DL-1, DL-2, and DL-3) were retrospectively investigated from January 2022 to June 2022 using consecutive full-field digital mammograms collected at two UK sites during 1 year (2017). Normal cases with 3-year follow-up and histopathologically proven cancer cases detected either at screening (that round or next) or within the 3-year interval were included. A preset specificity threshold equivalent to a single reader was applied. Performance was evaluated for stand-alone DL reading compared with single human reading, and for DL reading combined with human reading compared with double reading, using sensitivity and specificity as the primary metrics. <i>P</i> < .025 was considered to indicate statistical significance for noninferiority testing. Results A total of 26 722 cases (median patient age, 59.0 years [IQR, 54.0-63.0 years]) with mammograms acquired using machines from two vendors were included. Cases included 332 screen-detected, 174 interval, and 254 next-round cancers. Two of three stand-alone DL algorithms achieved noninferior sensitivity (DL-1: 64.8%, <i>P</i> < .001; DL-2: 56.7%, <i>P</i> = .03; DL-3: 58.9%, <i>P</i> < .001) compared with the single first reader (62.8%), and specificity was noninferior for DL-1 (92.8%; <i>P</i> < .001) and DL-2 (96.8%; <i>P</i> < .001) and superior for DL-3 (97.9%; <i>P</i> < .001) compared with the single first reader (96.5%). Combining the DL algorithms with human readers achieved noninferior sensitivity (67.0%, 65.6%, and 65.4% for DL-1, DL-2, and DL-3, respectively; <i>P</i> < .001 for all) compared with double reading (67.4%), and superior specificity (97.4%, 97.6%, and 97.6%; <i>P</i> < .001 for all) compared with double reading (97.1%). Conclusion Use of stand-alone DL algorithms in combination with a human reader could maintain screening accuracy while reducing workload. Published under a CC BY 4.0 license. <i>Supplemental material is available for this article.</i></p>\",\"PeriodicalId\":20896,\"journal\":{\"name\":\"Radiology\",\"volume\":\"313 2\",\"pages\":\"e233147\"},\"PeriodicalIF\":12.1000,\"publicationDate\":\"2024-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Radiology\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1148/radiol.233147\",\"RegionNum\":1,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Radiology","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1148/radiol.233147","RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING","Score":null,"Total":0}
引用次数: 0

摘要

背景 深度学习(DL)算法在乳腺X光筛查中与单人读片器相比,或者与人工读片器结合使用时与双人读片器相比,都显示出良好的效果。目的 在独立的英国数据集中,从外部验证三种 DL 算法作为乳腺 X 光筛查读片器的性能。材料与方法 在 2022 年 1 月至 2022 年 6 月期间,使用在英国两个站点收集的 1 年(2017 年)连续全视野数字乳腺 X 光照片,对三种商业 DL 算法(DL-1、DL-2 和 DL-3)进行了回顾性研究。其中包括随访 3 年的正常病例和组织病理学证实的癌症病例,这些病例要么是在筛查时(当轮或下一轮)发现的,要么是在 3 年间隔期内发现的。采用的预设特异性阈值相当于一个阅读器。使用灵敏度和特异性作为主要指标,评估了独立 DL 读取与单一人工读取的性能比较,以及 DL 读取与人工读取相结合与双重读取的性能比较。在进行非劣效性测试时,P < 025 被视为具有统计学意义。结果 共纳入了 26 722 个病例(患者年龄中位数为 59.0 岁 [IQR,54.0-63.0 岁]),这些病例的乳房 X 光照片是使用两个供应商的机器获得的。病例包括 332 例筛查出的癌症、174 例间隔期癌症和 254 例下一轮癌症。在三种独立的 DL 算法中,有两种算法的灵敏度(DL-1:64.8%,P < .001;DL-2:56.7%,P = .03;DL-3:58.9%,P < .001)不低于单个第一阅读器(62.8%),特异性(DL-1:92.8%;P < .001)和 DL-2:96.8%;P < .001)不低于单个第一阅读器(96.5%),DL-3:97.9%;P < .001)高于单个第一阅读器(96.5%)。将 DL 算法与人类读数器结合使用,灵敏度(DL-1、DL-2 和 DL-3 分别为 67.0%、65.6% 和 65.4%;P < .001)不低于双读数器(67.4%),特异性(97.4%、97.6% 和 97.6%;P < .001)高于双读数器(97.1%)。结论 将独立的 DL 算法与人工读片结合使用,既能保持筛查的准确性,又能减少工作量。以 CC BY 4.0 许可发布。本文有补充材料。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Deep Learning Algorithms for Breast Cancer Detection in a UK Screening Cohort: As Stand-alone Readers and Combined with Human Readers.

Background Deep learning (DL) algorithms have shown promising results in mammographic screening either compared to a single reader or, when deployed in conjunction with a human reader, compared with double reading. Purpose To externally validate the performance of three DL algorithms as mammographic screen readers in an independent UK data set. Materials and Methods Three commercial DL algorithms (DL-1, DL-2, and DL-3) were retrospectively investigated from January 2022 to June 2022 using consecutive full-field digital mammograms collected at two UK sites during 1 year (2017). Normal cases with 3-year follow-up and histopathologically proven cancer cases detected either at screening (that round or next) or within the 3-year interval were included. A preset specificity threshold equivalent to a single reader was applied. Performance was evaluated for stand-alone DL reading compared with single human reading, and for DL reading combined with human reading compared with double reading, using sensitivity and specificity as the primary metrics. P < .025 was considered to indicate statistical significance for noninferiority testing. Results A total of 26 722 cases (median patient age, 59.0 years [IQR, 54.0-63.0 years]) with mammograms acquired using machines from two vendors were included. Cases included 332 screen-detected, 174 interval, and 254 next-round cancers. Two of three stand-alone DL algorithms achieved noninferior sensitivity (DL-1: 64.8%, P < .001; DL-2: 56.7%, P = .03; DL-3: 58.9%, P < .001) compared with the single first reader (62.8%), and specificity was noninferior for DL-1 (92.8%; P < .001) and DL-2 (96.8%; P < .001) and superior for DL-3 (97.9%; P < .001) compared with the single first reader (96.5%). Combining the DL algorithms with human readers achieved noninferior sensitivity (67.0%, 65.6%, and 65.4% for DL-1, DL-2, and DL-3, respectively; P < .001 for all) compared with double reading (67.4%), and superior specificity (97.4%, 97.6%, and 97.6%; P < .001 for all) compared with double reading (97.1%). Conclusion Use of stand-alone DL algorithms in combination with a human reader could maintain screening accuracy while reducing workload. Published under a CC BY 4.0 license. Supplemental material is available for this article.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Radiology
Radiology 医学-核医学
CiteScore
35.20
自引率
3.00%
发文量
596
审稿时长
3.6 months
期刊介绍: Published regularly since 1923 by the Radiological Society of North America (RSNA), Radiology has long been recognized as the authoritative reference for the most current, clinically relevant and highest quality research in the field of radiology. Each month the journal publishes approximately 240 pages of peer-reviewed original research, authoritative reviews, well-balanced commentary on significant articles, and expert opinion on new techniques and technologies. Radiology publishes cutting edge and impactful imaging research articles in radiology and medical imaging in order to help improve human health.
期刊最新文献
Risk Factors for Pneumothorax Following Lung Biopsy: Another Peek at Air Leak. Sex-specific Associations between Left Ventricular Remodeling at MRI and Long-term Cardiovascular Risk. The Clinical Weight of Left Ventricular Mass and Shape. Assessment of Nonmass Lesions Detected with Screening Breast US Based on Mammographic Findings. CT-guided Coaxial Lung Biopsy: Number of Cores and Association with Complications.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1