Francesco Giganti, Nadia Moreira da Silva, Michael Yeung, Lucy Davies, Amy Frary, Mirjana Ferrer Rodriguez, Nikita Sushentsev, Nicholas Ashley, Adrian Andreou, Alison Bradley, Chris Wilson, Giles Maskell, Giorgio Brembilla, Iztok Caglic, Jakub Suchánek, Jobie Budd, Zobair Arya, Jonathan Aning, John Hayes, Mark De Bono, Nikhil Vasdev, Nimalan Sanmugalingam, Paul Burn, Raj Persad, Ramona Woitek, Richard Hindley, Sidath Liyanage, Sophie Squire, Tristan Barrett, Steffi Barwick, Mark Hinton, Anwar R Padhani, Antony Rix, Aarti Shah, Evis Sala
{"title":"AI-powered prostate cancer detection: a multi-centre, multi-scanner validation study.","authors":"Francesco Giganti, Nadia Moreira da Silva, Michael Yeung, Lucy Davies, Amy Frary, Mirjana Ferrer Rodriguez, Nikita Sushentsev, Nicholas Ashley, Adrian Andreou, Alison Bradley, Chris Wilson, Giles Maskell, Giorgio Brembilla, Iztok Caglic, Jakub Suchánek, Jobie Budd, Zobair Arya, Jonathan Aning, John Hayes, Mark De Bono, Nikhil Vasdev, Nimalan Sanmugalingam, Paul Burn, Raj Persad, Ramona Woitek, Richard Hindley, Sidath Liyanage, Sophie Squire, Tristan Barrett, Steffi Barwick, Mark Hinton, Anwar R Padhani, Antony Rix, Aarti Shah, Evis Sala","doi":"10.1007/s00330-024-11323-0","DOIUrl":null,"url":null,"abstract":"<p><strong>Objectives: </strong>Multi-centre, multi-vendor validation of artificial intelligence (AI) software to detect clinically significant prostate cancer (PCa) using multiparametric magnetic resonance imaging (MRI) is lacking. We compared a new AI solution, validated on a separate dataset from different UK hospitals, to the original multidisciplinary team (MDT)-supported radiologist's interpretations.</p><p><strong>Materials and methods: </strong>A Conformité Européenne (CE)-marked deep-learning (DL) computer-aided detection (CAD) medical device (Pi) was trained to detect Gleason Grade Group (GG) ≥ 2 cancer using retrospective data from the PROSTATEx dataset and five UK hospitals (793 patients). Our separate validation dataset was on six machines from two manufacturers across six sites (252 patients). Data included in the study were from MRI scans performed between August 2018 to October 2022. Patients with a negative MRI who did not undergo biopsy were assumed to be negative (90.4% had prostate-specific antigen density < 0.15 ng/mL<sup>2</sup>). ROC analysis was used to compare radiologists who used a 5-category suspicion score.</p><p><strong>Results: </strong>GG ≥ 2 prevalence in the validation set was 31%. Evaluated per patient, Pi was non-inferior to radiologists (considering a 10% performance difference as acceptable), with an area under the curve (AUC) of 0.91 vs. 0.95. At the predetermined risk threshold of 3.5, the AI software's sensitivity was 95% and specificity 67%, while radiologists at Prostate Imaging-Reporting and Data Systems/Likert ≥ 3 identified GG ≥ 2 with a sensitivity of 99% and specificity of 73%. AI performed well per-site (AUC ≥ 0.83) at the patient-level independent of scanner age and field strength.</p><p><strong>Conclusion: </strong>Real-world data testing suggests that Pi matches the performance of MDT-supported radiologists in GG ≥ 2 PCa detection and generalises to multiple sites, scanner vendors, and models.</p><p><strong>Key points: </strong>QuestionThe performance of artificial intelligence-based medical tools for prostate MRI has yet to be evaluated on multi-centre, multi-vendor data to assess generalisability. FindingsA dedicated AI medical tool matches the performance of multidisciplinary team-supported radiologists in prostate cancer detection and generalises to multiple sites and scanners. Clinical relevanceThis software has the potential to support the MRI process for biopsy decision-making and target identification, but future prospective studies, where lesions identified by artificial intelligence are biopsied separately, are needed.</p>","PeriodicalId":12076,"journal":{"name":"European Radiology","volume":" ","pages":""},"PeriodicalIF":4.7000,"publicationDate":"2025-02-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"European Radiology","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1007/s00330-024-11323-0","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING","Score":null,"Total":0}
引用次数: 0
Abstract
Objectives: Multi-centre, multi-vendor validation of artificial intelligence (AI) software to detect clinically significant prostate cancer (PCa) using multiparametric magnetic resonance imaging (MRI) is lacking. We compared a new AI solution, validated on a separate dataset from different UK hospitals, to the original multidisciplinary team (MDT)-supported radiologist's interpretations.
Materials and methods: A Conformité Européenne (CE)-marked deep-learning (DL) computer-aided detection (CAD) medical device (Pi) was trained to detect Gleason Grade Group (GG) ≥ 2 cancer using retrospective data from the PROSTATEx dataset and five UK hospitals (793 patients). Our separate validation dataset was on six machines from two manufacturers across six sites (252 patients). Data included in the study were from MRI scans performed between August 2018 to October 2022. Patients with a negative MRI who did not undergo biopsy were assumed to be negative (90.4% had prostate-specific antigen density < 0.15 ng/mL2). ROC analysis was used to compare radiologists who used a 5-category suspicion score.
Results: GG ≥ 2 prevalence in the validation set was 31%. Evaluated per patient, Pi was non-inferior to radiologists (considering a 10% performance difference as acceptable), with an area under the curve (AUC) of 0.91 vs. 0.95. At the predetermined risk threshold of 3.5, the AI software's sensitivity was 95% and specificity 67%, while radiologists at Prostate Imaging-Reporting and Data Systems/Likert ≥ 3 identified GG ≥ 2 with a sensitivity of 99% and specificity of 73%. AI performed well per-site (AUC ≥ 0.83) at the patient-level independent of scanner age and field strength.
Conclusion: Real-world data testing suggests that Pi matches the performance of MDT-supported radiologists in GG ≥ 2 PCa detection and generalises to multiple sites, scanner vendors, and models.
Key points: QuestionThe performance of artificial intelligence-based medical tools for prostate MRI has yet to be evaluated on multi-centre, multi-vendor data to assess generalisability. FindingsA dedicated AI medical tool matches the performance of multidisciplinary team-supported radiologists in prostate cancer detection and generalises to multiple sites and scanners. Clinical relevanceThis software has the potential to support the MRI process for biopsy decision-making and target identification, but future prospective studies, where lesions identified by artificial intelligence are biopsied separately, are needed.
期刊介绍:
European Radiology (ER) continuously updates scientific knowledge in radiology by publication of strong original articles and state-of-the-art reviews written by leading radiologists. A well balanced combination of review articles, original papers, short communications from European radiological congresses and information on society matters makes ER an indispensable source for current information in this field.
This is the Journal of the European Society of Radiology, and the official journal of a number of societies.
From 2004-2008 supplements to European Radiology were published under its companion, European Radiology Supplements, ISSN 1613-3749.