Avivah J Wang, Fakrul Islam Tushar, Michael R Harowicz, Betty C Tong, Kyle J Lafata, Tina D Tailor, Joseph Y Lo
{"title":"The Duke Lung Cancer Screening (DLCS) Dataset: A Reference Dataset of Annotated Low-Dose Screening Thoracic CT.","authors":"Avivah J Wang, Fakrul Islam Tushar, Michael R Harowicz, Betty C Tong, Kyle J Lafata, Tina D Tailor, Joseph Y Lo","doi":"10.1148/ryai.240248","DOIUrl":"10.1148/ryai.240248","url":null,"abstract":"","PeriodicalId":29787,"journal":{"name":"Radiology-Artificial Intelligence","volume":" ","pages":"e240248"},"PeriodicalIF":13.2,"publicationDate":"2025-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12319698/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144053087","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
José Guilherme de Almeida, Nuno M Rodrigues, Ana Sofia Castro Verde, Ana Mascarenhas Gaivão, Carlos Bilreiro, Inês Santiago, Joana Ip, Sara Belião, Celso Matos, Sara Silva, Manolis Tsiknakis, Kostantinos Marias, Daniele Regge, Nikolaos Papanikolaou
Purpose To assess the effect of scanner manufacturer and scanning protocol on the performance of deep learning models to classify aggressiveness of prostate cancer (PCa) at biparametric MRI (bpMRI). Materials and Methods In this retrospective study, 5478 cases from ProstateNet, a PCa bpMRI dataset with examinations from 13 centers, were used to develop five deep learning (DL) models to predict PCa aggressiveness with minimal lesion information and test how using data from different subgroups-scanner manufacturers and endorectal coil (ERC) use (Siemens, Philips, GE with and without ERC, and the full dataset)-affects model performance. Performance was assessed using the area under the receiver operating characteristic curve (AUC). The effect of clinical features (age, prostate-specific antigen level, Prostate Imaging Reporting and Data System score) on model performance was also evaluated. Results DL models were trained on 4328 bpMRI cases, and the best model achieved an AUC of 0.73 when trained and tested using data from all manufacturers. Held-out test set performance was higher when models trained with data from a manufacturer were tested on the same manufacturer (within- and between-manufacturer AUC differences of 0.05 on average, P < .001). The addition of clinical features did not improve performance (P = .24). Learning curve analyses showed that performance remained stable as training data increased. Analysis of DL features showed that scanner manufacturer and scanning protocol heavily influenced feature distributions. Conclusion In automated classification of PCa aggressiveness using bpMRI data, scanner manufacturer and ERC use had a major effect on DL model performance and features. Keywords: Convolutional Neural Network (CNN), Computer-aided Diagnosis (CAD), Computer Applications-General (Informatics), Oncology Supplemental material is available for this article. Published under a CC BY 4.0 license. See also commentary by Suri and Hsu in this issue.
“刚刚接受”的论文经过了全面的同行评审,并已被接受发表在《放射学:人工智能》杂志上。这篇文章将经过编辑,布局和校样审查,然后在其最终版本出版。请注意,在最终编辑文章的制作过程中,可能会发现可能影响内容的错误。目的评估扫描仪制造商和扫描方案对深度学习模型在双参数MRI (bpMRI)上对前列腺癌(PCa)侵袭性分类性能的影响。在这项回顾性研究中,来自ProstateNet的5478例病例(来自13个中心的前列腺癌bpMRI数据集)被用于开发五种深度学习(DL)模型,以最少的病变信息预测前列腺癌的侵袭性,并测试使用来自不同亚组的数据-扫描仪制造商和直肠内线圈(ERC)的使用(西门子,飞利浦,GE有或没有ERC和完整数据集)-如何影响模型性能。使用接收器工作特性曲线下面积(AUC)评估性能。临床特征(年龄、前列腺特异性抗原水平、前列腺影像学报告和数据系统评分)对模型性能的影响也进行了评估。结果在4328例bpMRI病例上训练DL模型,使用所有厂商的数据进行训练和测试时,最佳模型AUC = 0.73。当使用来自制造商的数据训练的模型在同一制造商上进行测试时,保留测试集的性能更高(制造商内部和制造商之间的AUC平均差异为0.05,P < .001)。临床特征的增加并没有提高疗效(P = 0.24)。学习曲线分析表明,随着训练数据的增加,性能保持稳定。对DL特征的分析表明,扫描仪制造商和扫描协议对特征分布有很大影响。结论在利用bpMRI数据对前列腺癌侵袭性进行自动分类时,扫描仪制造商和直肠内线圈的使用对DL模型的性能和特征有重要影响。在CC BY 4.0许可下发布。
{"title":"Impact of Scanner Manufacturer, Endorectal Coil Use, and Clinical Variables on Deep Learning-assisted Prostate Cancer Classification Using Multiparametric MRI.","authors":"José Guilherme de Almeida, Nuno M Rodrigues, Ana Sofia Castro Verde, Ana Mascarenhas Gaivão, Carlos Bilreiro, Inês Santiago, Joana Ip, Sara Belião, Celso Matos, Sara Silva, Manolis Tsiknakis, Kostantinos Marias, Daniele Regge, Nikolaos Papanikolaou","doi":"10.1148/ryai.230555","DOIUrl":"10.1148/ryai.230555","url":null,"abstract":"<p><p>Purpose To assess the effect of scanner manufacturer and scanning protocol on the performance of deep learning models to classify aggressiveness of prostate cancer (PCa) at biparametric MRI (bpMRI). Materials and Methods In this retrospective study, 5478 cases from ProstateNet, a PCa bpMRI dataset with examinations from 13 centers, were used to develop five deep learning (DL) models to predict PCa aggressiveness with minimal lesion information and test how using data from different subgroups-scanner manufacturers and endorectal coil (ERC) use (Siemens, Philips, GE with and without ERC, and the full dataset)-affects model performance. Performance was assessed using the area under the receiver operating characteristic curve (AUC). The effect of clinical features (age, prostate-specific antigen level, Prostate Imaging Reporting and Data System score) on model performance was also evaluated. Results DL models were trained on 4328 bpMRI cases, and the best model achieved an AUC of 0.73 when trained and tested using data from all manufacturers. Held-out test set performance was higher when models trained with data from a manufacturer were tested on the same manufacturer (within- and between-manufacturer AUC differences of 0.05 on average, <i>P</i> < .001). The addition of clinical features did not improve performance (<i>P</i> = .24). Learning curve analyses showed that performance remained stable as training data increased. Analysis of DL features showed that scanner manufacturer and scanning protocol heavily influenced feature distributions. Conclusion In automated classification of PCa aggressiveness using bpMRI data, scanner manufacturer and ERC use had a major effect on DL model performance and features. <b>Keywords:</b> Convolutional Neural Network (CNN), Computer-aided Diagnosis (CAD), Computer Applications-General (Informatics), Oncology <i>Supplemental material is available for this article.</i> Published under a CC BY 4.0 license. See also commentary by Suri and Hsu in this issue.</p>","PeriodicalId":29787,"journal":{"name":"Radiology-Artificial Intelligence","volume":" ","pages":"e230555"},"PeriodicalIF":8.1,"publicationDate":"2025-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143013116","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Natural Language Processing for Everyone.","authors":"Quirin D Strotzer","doi":"10.1148/ryai.250218","DOIUrl":"10.1148/ryai.250218","url":null,"abstract":"","PeriodicalId":29787,"journal":{"name":"Radiology-Artificial Intelligence","volume":"7 3","pages":"e250218"},"PeriodicalIF":13.2,"publicationDate":"2025-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144053089","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Artificial Intelligence Is Brittle: We Need to Do Better.","authors":"Abhinav Suri, William Hsu","doi":"10.1148/ryai.250081","DOIUrl":"10.1148/ryai.250081","url":null,"abstract":"","PeriodicalId":29787,"journal":{"name":"Radiology-Artificial Intelligence","volume":"7 3","pages":"e250081"},"PeriodicalIF":13.2,"publicationDate":"2025-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12127952/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143812643","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Purpose To evaluate a sham-artificial intelligence (AI) model acting as a placebo control for a standard-AI model for diagnosis of intracranial aneurysm. Materials and Methods This retrospective crossover, blinded, multireader, multicase study was conducted from November 2022 to March 2023. A sham-AI model with near-zero sensitivity and similar specificity to a standard AI model was developed using 16 422 CT angiography examinations. Digital subtraction angiography-verified CT angiographic examinations from four hospitals were collected, half of which were processed by standard AI and the others by sham AI to generate sequence A; sequence B was generated in the reverse order. Twenty-eight radiologists from seven hospitals were randomly assigned to either sequence and then assigned to the other sequence after a washout period. The diagnostic performances of radiologists alone, radiologists with standard-AI assistance, and radiologists with sham-AI assistance were compared using sensitivity and specificity, and radiologists' susceptibility to sham AI suggestions was assessed. Results The testing dataset included 300 patients (median age, 61.0 years [IQR, 52.0-67.0]; 199 male), 50 of whom had aneurysms. Standard AI and sham AI performed as expected (sensitivity, 96.0% vs 0.0%; specificity, 82.0% vs 76.0%). The differences in sensitivity and specificity between standard AI-assisted and sham AI-assisted readings were 20.7% (95% CI: 15.8, 25.5 [superiority]) and 0.0% (95% CI: -2.0, 2.0 [noninferiority]), respectively. The difference between sham AI-assisted readings and radiologists alone was -2.6% (95% CI: -3.8, -1.4 [noninferiority]) for both sensitivity and specificity. After sham-AI suggestions, 5.3% (44 of 823) of true-positive and 1.2% (seven of 577) of false-negative results of radiologists alone were changed. Conclusion Radiologists' diagnostic performance was not compromised when aided by the proposed sham-AI model compared with their unassisted performance. Keywords: CT Angiography, Vascular, Intracranial Aneurysm, Sham AI Supplemental material is available for this article. Published under a CC BY 4.0 license. See also commentary by Mayfield and Romero in this issue.
{"title":"Development and Validation of a Sham-AI Model for Intracranial Aneurysm Detection at CT Angiography.","authors":"Zhao Shi, Bin Hu, Mengjie Lu, Manting Zhang, Haiting Yang, Bo He, Jiyao Ma, Chunfeng Hu, Li Lu, Sheng Li, Shiyu Ren, Yonggao Zhang, Jun Li, Mayidili Nijiati, Jiake Dong, Hao Wang, Zhen Zhou, Fandong Zhang, Chengwei Pan, Yizhou Yu, Zijian Chen, Chang Sheng Zhou, Yongyue Wei, Junlin Zhou, Long Jiang Zhang","doi":"10.1148/ryai.240140","DOIUrl":"10.1148/ryai.240140","url":null,"abstract":"<p><p>Purpose To evaluate a sham-artificial intelligence (AI) model acting as a placebo control for a standard-AI model for diagnosis of intracranial aneurysm. Materials and Methods This retrospective crossover, blinded, multireader, multicase study was conducted from November 2022 to March 2023. A sham-AI model with near-zero sensitivity and similar specificity to a standard AI model was developed using 16 422 CT angiography examinations. Digital subtraction angiography-verified CT angiographic examinations from four hospitals were collected, half of which were processed by standard AI and the others by sham AI to generate sequence A; sequence B was generated in the reverse order. Twenty-eight radiologists from seven hospitals were randomly assigned to either sequence and then assigned to the other sequence after a washout period. The diagnostic performances of radiologists alone, radiologists with standard-AI assistance, and radiologists with sham-AI assistance were compared using sensitivity and specificity, and radiologists' susceptibility to sham AI suggestions was assessed. Results The testing dataset included 300 patients (median age, 61.0 years [IQR, 52.0-67.0]; 199 male), 50 of whom had aneurysms. Standard AI and sham AI performed as expected (sensitivity, 96.0% vs 0.0%; specificity, 82.0% vs 76.0%). The differences in sensitivity and specificity between standard AI-assisted and sham AI-assisted readings were 20.7% (95% CI: 15.8, 25.5 [superiority]) and 0.0% (95% CI: -2.0, 2.0 [noninferiority]), respectively. The difference between sham AI-assisted readings and radiologists alone was -2.6% (95% CI: -3.8, -1.4 [noninferiority]) for both sensitivity and specificity. After sham-AI suggestions, 5.3% (44 of 823) of true-positive and 1.2% (seven of 577) of false-negative results of radiologists alone were changed. Conclusion Radiologists' diagnostic performance was not compromised when aided by the proposed sham-AI model compared with their unassisted performance. <b>Keywords:</b> CT Angiography, Vascular, Intracranial Aneurysm, Sham AI <i>Supplemental material is available for this article.</i> Published under a CC BY 4.0 license. See also commentary by Mayfield and Romero in this issue.</p>","PeriodicalId":29787,"journal":{"name":"Radiology-Artificial Intelligence","volume":" ","pages":"e240140"},"PeriodicalIF":8.1,"publicationDate":"2025-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143658885","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}