Sarah E Hickman, Nicholas R Payne, Richard T Black, Yuan Huang, Andrew N Priest, Sue Hudson, Bahman Kasmai, Arne Juette, Muzna Nanaa, Fiona J Gilbert
{"title":"深度学习算法用于英国筛查队列中的乳腺癌检测:作为独立阅读器和与人工阅读器相结合。","authors":"Sarah E Hickman, Nicholas R Payne, Richard T Black, Yuan Huang, Andrew N Priest, Sue Hudson, Bahman Kasmai, Arne Juette, Muzna Nanaa, Fiona J Gilbert","doi":"10.1148/radiol.233147","DOIUrl":null,"url":null,"abstract":"<p><p>Background Deep learning (DL) algorithms have shown promising results in mammographic screening either compared to a single reader or, when deployed in conjunction with a human reader, compared with double reading. Purpose To externally validate the performance of three DL algorithms as mammographic screen readers in an independent UK data set. Materials and Methods Three commercial DL algorithms (DL-1, DL-2, and DL-3) were retrospectively investigated from January 2022 to June 2022 using consecutive full-field digital mammograms collected at two UK sites during 1 year (2017). Normal cases with 3-year follow-up and histopathologically proven cancer cases detected either at screening (that round or next) or within the 3-year interval were included. A preset specificity threshold equivalent to a single reader was applied. Performance was evaluated for stand-alone DL reading compared with single human reading, and for DL reading combined with human reading compared with double reading, using sensitivity and specificity as the primary metrics. <i>P</i> < .025 was considered to indicate statistical significance for noninferiority testing. Results A total of 26 722 cases (median patient age, 59.0 years [IQR, 54.0-63.0 years]) with mammograms acquired using machines from two vendors were included. Cases included 332 screen-detected, 174 interval, and 254 next-round cancers. Two of three stand-alone DL algorithms achieved noninferior sensitivity (DL-1: 64.8%, <i>P</i> < .001; DL-2: 56.7%, <i>P</i> = .03; DL-3: 58.9%, <i>P</i> < .001) compared with the single first reader (62.8%), and specificity was noninferior for DL-1 (92.8%; <i>P</i> < .001) and DL-2 (96.8%; <i>P</i> < .001) and superior for DL-3 (97.9%; <i>P</i> < .001) compared with the single first reader (96.5%). Combining the DL algorithms with human readers achieved noninferior sensitivity (67.0%, 65.6%, and 65.4% for DL-1, DL-2, and DL-3, respectively; <i>P</i> < .001 for all) compared with double reading (67.4%), and superior specificity (97.4%, 97.6%, and 97.6%; <i>P</i> < .001 for all) compared with double reading (97.1%). Conclusion Use of stand-alone DL algorithms in combination with a human reader could maintain screening accuracy while reducing workload. Published under a CC BY 4.0 license. <i>Supplemental material is available for this article.</i></p>","PeriodicalId":20896,"journal":{"name":"Radiology","volume":"313 2","pages":"e233147"},"PeriodicalIF":12.1000,"publicationDate":"2024-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Deep Learning Algorithms for Breast Cancer Detection in a UK Screening Cohort: As Stand-alone Readers and Combined with Human Readers.\",\"authors\":\"Sarah E Hickman, Nicholas R Payne, Richard T Black, Yuan Huang, Andrew N Priest, Sue Hudson, Bahman Kasmai, Arne Juette, Muzna Nanaa, Fiona J Gilbert\",\"doi\":\"10.1148/radiol.233147\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>Background Deep learning (DL) algorithms have shown promising results in mammographic screening either compared to a single reader or, when deployed in conjunction with a human reader, compared with double reading. Purpose To externally validate the performance of three DL algorithms as mammographic screen readers in an independent UK data set. Materials and Methods Three commercial DL algorithms (DL-1, DL-2, and DL-3) were retrospectively investigated from January 2022 to June 2022 using consecutive full-field digital mammograms collected at two UK sites during 1 year (2017). Normal cases with 3-year follow-up and histopathologically proven cancer cases detected either at screening (that round or next) or within the 3-year interval were included. A preset specificity threshold equivalent to a single reader was applied. Performance was evaluated for stand-alone DL reading compared with single human reading, and for DL reading combined with human reading compared with double reading, using sensitivity and specificity as the primary metrics. <i>P</i> < .025 was considered to indicate statistical significance for noninferiority testing. Results A total of 26 722 cases (median patient age, 59.0 years [IQR, 54.0-63.0 years]) with mammograms acquired using machines from two vendors were included. Cases included 332 screen-detected, 174 interval, and 254 next-round cancers. Two of three stand-alone DL algorithms achieved noninferior sensitivity (DL-1: 64.8%, <i>P</i> < .001; DL-2: 56.7%, <i>P</i> = .03; DL-3: 58.9%, <i>P</i> < .001) compared with the single first reader (62.8%), and specificity was noninferior for DL-1 (92.8%; <i>P</i> < .001) and DL-2 (96.8%; <i>P</i> < .001) and superior for DL-3 (97.9%; <i>P</i> < .001) compared with the single first reader (96.5%). Combining the DL algorithms with human readers achieved noninferior sensitivity (67.0%, 65.6%, and 65.4% for DL-1, DL-2, and DL-3, respectively; <i>P</i> < .001 for all) compared with double reading (67.4%), and superior specificity (97.4%, 97.6%, and 97.6%; <i>P</i> < .001 for all) compared with double reading (97.1%). Conclusion Use of stand-alone DL algorithms in combination with a human reader could maintain screening accuracy while reducing workload. Published under a CC BY 4.0 license. <i>Supplemental material is available for this article.</i></p>\",\"PeriodicalId\":20896,\"journal\":{\"name\":\"Radiology\",\"volume\":\"313 2\",\"pages\":\"e233147\"},\"PeriodicalIF\":12.1000,\"publicationDate\":\"2024-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Radiology\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1148/radiol.233147\",\"RegionNum\":1,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Radiology","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1148/radiol.233147","RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING","Score":null,"Total":0}
Deep Learning Algorithms for Breast Cancer Detection in a UK Screening Cohort: As Stand-alone Readers and Combined with Human Readers.
Background Deep learning (DL) algorithms have shown promising results in mammographic screening either compared to a single reader or, when deployed in conjunction with a human reader, compared with double reading. Purpose To externally validate the performance of three DL algorithms as mammographic screen readers in an independent UK data set. Materials and Methods Three commercial DL algorithms (DL-1, DL-2, and DL-3) were retrospectively investigated from January 2022 to June 2022 using consecutive full-field digital mammograms collected at two UK sites during 1 year (2017). Normal cases with 3-year follow-up and histopathologically proven cancer cases detected either at screening (that round or next) or within the 3-year interval were included. A preset specificity threshold equivalent to a single reader was applied. Performance was evaluated for stand-alone DL reading compared with single human reading, and for DL reading combined with human reading compared with double reading, using sensitivity and specificity as the primary metrics. P < .025 was considered to indicate statistical significance for noninferiority testing. Results A total of 26 722 cases (median patient age, 59.0 years [IQR, 54.0-63.0 years]) with mammograms acquired using machines from two vendors were included. Cases included 332 screen-detected, 174 interval, and 254 next-round cancers. Two of three stand-alone DL algorithms achieved noninferior sensitivity (DL-1: 64.8%, P < .001; DL-2: 56.7%, P = .03; DL-3: 58.9%, P < .001) compared with the single first reader (62.8%), and specificity was noninferior for DL-1 (92.8%; P < .001) and DL-2 (96.8%; P < .001) and superior for DL-3 (97.9%; P < .001) compared with the single first reader (96.5%). Combining the DL algorithms with human readers achieved noninferior sensitivity (67.0%, 65.6%, and 65.4% for DL-1, DL-2, and DL-3, respectively; P < .001 for all) compared with double reading (67.4%), and superior specificity (97.4%, 97.6%, and 97.6%; P < .001 for all) compared with double reading (97.1%). Conclusion Use of stand-alone DL algorithms in combination with a human reader could maintain screening accuracy while reducing workload. Published under a CC BY 4.0 license. Supplemental material is available for this article.
期刊介绍:
Published regularly since 1923 by the Radiological Society of North America (RSNA), Radiology has long been recognized as the authoritative reference for the most current, clinically relevant and highest quality research in the field of radiology. Each month the journal publishes approximately 240 pages of peer-reviewed original research, authoritative reviews, well-balanced commentary on significant articles, and expert opinion on new techniques and technologies.
Radiology publishes cutting edge and impactful imaging research articles in radiology and medical imaging in order to help improve human health.