Atilla P Kiraly, Corbin A Cunningham, Ryan Najafi, Zaid Nabulsi, Jie Yang, Charles Lau, Joseph R Ledsam, Wenxing Ye, Diego Ardila, Scott M McKinney, Rory Pilgrim, Yun Liu, Hiroaki Saito, Yasuteru Shimamura, Mozziyar Etemadi, David Melnick, Sunny Jansen, Greg S Corrado, Lily Peng, Daniel Tse, Shravya Shetty, Shruthi Prabhakara, David P Naidich, Neeral Beladia, Krish Eswaran
Purpose To evaluate the impact of an artificial intelligence (AI) assistant for lung cancer screening on multinational clinical workflows. Materials and Methods An AI assistant for lung cancer screening was evaluated on two retrospective randomized multireader multicase studies where 627 (141 cancer-positive cases) low-dose chest CT cases were each read twice (with and without AI assistance) by experienced thoracic radiologists (six U.S.-based or six Japan-based radiologists), resulting in a total of 7524 interpretations. Positive cases were defined as those within 2 years before a pathology-confirmed lung cancer diagnosis. Negative cases were defined as those without any subsequent cancer diagnosis for at least 2 years and were enriched for a spectrum of diverse nodules. The studies measured the readers' level of suspicion (on a 0-100 scale), country-specific screening system scoring categories, and management recommendations. Evaluation metrics included the area under the receiver operating characteristic curve (AUC) for level of suspicion and sensitivity and specificity of recall recommendations. Results With AI assistance, the radiologists' AUC increased by 0.023 (0.70 to 0.72; P = .02) for the U.S. study and by 0.023 (0.93 to 0.96; P = .18) for the Japan study. Scoring system specificity for actionable findings increased 5.5% (57% to 63%; P < .001) for the U.S. study and 6.7% (23% to 30%; P < .001) for the Japan study. There was no evidence of a difference in corresponding sensitivity between unassisted and AI-assisted reads for the U.S. (67.3% to 67.5%; P = .88) and Japan (98% to 100%; P > .99) studies. Corresponding stand-alone AI AUC system performance was 0.75 (95% CI: 0.70, 0.81) and 0.88 (95% CI: 0.78, 0.97) for the U.S.- and Japan-based datasets, respectively. Conclusion The concurrent AI interface improved lung cancer screening specificity in both U.S.- and Japan-based reader studies, meriting further study in additional international screening environments. Keywords: Assistive Artificial Intelligence, Lung Cancer Screening, CT Supplemental material is available for this article. Published under a CC BY 4.0 license.
{"title":"Assistive AI in Lung Cancer Screening: A Retrospective Multinational Study in the United States and Japan.","authors":"Atilla P Kiraly, Corbin A Cunningham, Ryan Najafi, Zaid Nabulsi, Jie Yang, Charles Lau, Joseph R Ledsam, Wenxing Ye, Diego Ardila, Scott M McKinney, Rory Pilgrim, Yun Liu, Hiroaki Saito, Yasuteru Shimamura, Mozziyar Etemadi, David Melnick, Sunny Jansen, Greg S Corrado, Lily Peng, Daniel Tse, Shravya Shetty, Shruthi Prabhakara, David P Naidich, Neeral Beladia, Krish Eswaran","doi":"10.1148/ryai.230079","DOIUrl":"10.1148/ryai.230079","url":null,"abstract":"<p><p>Purpose To evaluate the impact of an artificial intelligence (AI) assistant for lung cancer screening on multinational clinical workflows. Materials and Methods An AI assistant for lung cancer screening was evaluated on two retrospective randomized multireader multicase studies where 627 (141 cancer-positive cases) low-dose chest CT cases were each read twice (with and without AI assistance) by experienced thoracic radiologists (six U.S.-based or six Japan-based radiologists), resulting in a total of 7524 interpretations. Positive cases were defined as those within 2 years before a pathology-confirmed lung cancer diagnosis. Negative cases were defined as those without any subsequent cancer diagnosis for at least 2 years and were enriched for a spectrum of diverse nodules. The studies measured the readers' level of suspicion (on a 0-100 scale), country-specific screening system scoring categories, and management recommendations. Evaluation metrics included the area under the receiver operating characteristic curve (AUC) for level of suspicion and sensitivity and specificity of recall recommendations. Results With AI assistance, the radiologists' AUC increased by 0.023 (0.70 to 0.72; <i>P</i> = .02) for the U.S. study and by 0.023 (0.93 to 0.96; <i>P</i> = .18) for the Japan study. Scoring system specificity for actionable findings increased 5.5% (57% to 63%; <i>P</i> < .001) for the U.S. study and 6.7% (23% to 30%; <i>P</i> < .001) for the Japan study. There was no evidence of a difference in corresponding sensitivity between unassisted and AI-assisted reads for the U.S. (67.3% to 67.5%; <i>P</i> = .88) and Japan (98% to 100%; <i>P</i> > .99) studies. Corresponding stand-alone AI AUC system performance was 0.75 (95% CI: 0.70, 0.81) and 0.88 (95% CI: 0.78, 0.97) for the U.S.- and Japan-based datasets, respectively. Conclusion The concurrent AI interface improved lung cancer screening specificity in both U.S.- and Japan-based reader studies, meriting further study in additional international screening environments. <b>Keywords:</b> Assistive Artificial Intelligence, Lung Cancer Screening, CT <i>Supplemental material is available for this article.</i> Published under a CC BY 4.0 license.</p>","PeriodicalId":29787,"journal":{"name":"Radiology-Artificial Intelligence","volume":" ","pages":"e230079"},"PeriodicalIF":8.1,"publicationDate":"2024-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11140517/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140111548","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"2023 Manuscript Reviewers: A Note of Thanks.","authors":"Curtis P Langlotz, Charles E Kahn","doi":"10.1148/ryai.240138","DOIUrl":"10.1148/ryai.240138","url":null,"abstract":"","PeriodicalId":29787,"journal":{"name":"Radiology-Artificial Intelligence","volume":"6 2","pages":"e240138"},"PeriodicalIF":9.8,"publicationDate":"2024-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10982905/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140294780","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Lauren Etter, Margrit Betke, Ingrid Y Camelo, Christopher J Gill, Rachel Pieciak, Russell Thompson, Libertario Demi, Umair Khan, Alyse Wheelock, Janet Katanga, Bindu N Setty, Ilse Castro-Aragon
See also the commentary by Sitek in this issue. Supplemental material is available for this article.
{"title":"Curated and Annotated Dataset of Lung US Images in Zambian Children with Clinical Pneumonia.","authors":"Lauren Etter, Margrit Betke, Ingrid Y Camelo, Christopher J Gill, Rachel Pieciak, Russell Thompson, Libertario Demi, Umair Khan, Alyse Wheelock, Janet Katanga, Bindu N Setty, Ilse Castro-Aragon","doi":"10.1148/ryai.230147","DOIUrl":"10.1148/ryai.230147","url":null,"abstract":"<p><p>See also the commentary by Sitek in this issue. <i>Supplemental material is available for this article.</i></p>","PeriodicalId":29787,"journal":{"name":"Radiology-Artificial Intelligence","volume":" ","pages":"e230147"},"PeriodicalIF":9.8,"publicationDate":"2024-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10982815/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139913646","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Reuben A Schmidt, Jarrel C Y Seah, Ke Cao, Lincoln Lim, Wei Lim, Justin Yeung
This study evaluated the ability of generative large language models (LLMs) to detect speech recognition errors in radiology reports. A dataset of 3233 CT and MRI reports was assessed by radiologists for speech recognition errors. Errors were categorized as clinically significant or not clinically significant. Performances of five generative LLMs-GPT-3.5-turbo, GPT-4, text-davinci-003, Llama-v2-70B-chat, and Bard-were compared in detecting these errors, using manual error detection as the reference standard. Prompt engineering was used to optimize model performance. GPT-4 demonstrated high accuracy in detecting clinically significant errors (precision, 76.9%; recall, 100%; F1 score, 86.9%) and not clinically significant errors (precision, 93.9%; recall, 94.7%; F1 score, 94.3%). Text-davinci-003 achieved F1 scores of 72% and 46.6% for clinically significant and not clinically significant errors, respectively. GPT-3.5-turbo obtained 59.1% and 32.2% F1 scores, while Llama-v2-70B-chat scored 72.8% and 47.7%. Bard showed the lowest accuracy, with F1 scores of 47.5% and 20.9%. GPT-4 effectively identified challenging errors of nonsense phrases and internally inconsistent statements. Longer reports, resident dictation, and overnight shifts were associated with higher error rates. In conclusion, advanced generative LLMs show potential for automatic detection of speech recognition errors in radiology reports. Keywords: CT, Large Language Model, Machine Learning, MRI, Natural Language Processing, Radiology Reports, Speech, Unsupervised Learning Supplemental material is available for this article.
{"title":"Generative Large Language Models for Detection of Speech Recognition Errors in Radiology Reports.","authors":"Reuben A Schmidt, Jarrel C Y Seah, Ke Cao, Lincoln Lim, Wei Lim, Justin Yeung","doi":"10.1148/ryai.230205","DOIUrl":"10.1148/ryai.230205","url":null,"abstract":"<p><p>This study evaluated the ability of generative large language models (LLMs) to detect speech recognition errors in radiology reports. A dataset of 3233 CT and MRI reports was assessed by radiologists for speech recognition errors. Errors were categorized as clinically significant or not clinically significant. Performances of five generative LLMs-GPT-3.5-turbo, GPT-4, text-davinci-003, Llama-v2-70B-chat, and Bard-were compared in detecting these errors, using manual error detection as the reference standard. Prompt engineering was used to optimize model performance. GPT-4 demonstrated high accuracy in detecting clinically significant errors (precision, 76.9%; recall, 100%; F1 score, 86.9%) and not clinically significant errors (precision, 93.9%; recall, 94.7%; F1 score, 94.3%). Text-davinci-003 achieved F1 scores of 72% and 46.6% for clinically significant and not clinically significant errors, respectively. GPT-3.5-turbo obtained 59.1% and 32.2% F1 scores, while Llama-v2-70B-chat scored 72.8% and 47.7%. Bard showed the lowest accuracy, with F1 scores of 47.5% and 20.9%. GPT-4 effectively identified challenging errors of nonsense phrases and internally inconsistent statements. Longer reports, resident dictation, and overnight shifts were associated with higher error rates. In conclusion, advanced generative LLMs show potential for automatic detection of speech recognition errors in radiology reports. <b>Keywords:</b> CT, Large Language Model, Machine Learning, MRI, Natural Language Processing, Radiology Reports, Speech, Unsupervised Learning <i>Supplemental material is available for this article</i>.</p>","PeriodicalId":29787,"journal":{"name":"Radiology-Artificial Intelligence","volume":" ","pages":"e230205"},"PeriodicalIF":9.8,"publicationDate":"2024-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10982816/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139543220","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Purpose To develop a Weakly supervISed model DevelOpment fraMework (WISDOM) model to construct a lymph node (LN) diagnosis model for patients with rectal cancer (RC) that uses preoperative MRI data coupled with postoperative patient-level pathologic information. Materials and Methods In this retrospective study, the WISDOM model was built using MRI (T2-weighted and diffusion-weighted imaging) and patient-level pathologic information (the number of postoperatively confirmed metastatic LNs and resected LNs) based on the data of patients with RC between January 2016 and November 2017. The incremental value of the model in assisting radiologists was investigated. The performances in binary and ternary N staging were evaluated using area under the receiver operating characteristic curve (AUC) and the concordance index (C index), respectively. Results A total of 1014 patients (median age, 62 years; IQR, 54-68 years; 590 male) were analyzed, including the training cohort (n = 589) and internal test cohort (n = 146) from center 1 and two external test cohorts (cohort 1: 117; cohort 2: 162) from centers 2 and 3. The WISDOM model yielded an overall AUC of 0.81 and C index of 0.765, significantly outperforming junior radiologists (AUC = 0.69, P < .001; C index = 0.689, P < .001) and performing comparably with senior radiologists (AUC = 0.79, P = .21; C index = 0.788, P = .22). Moreover, the model significantly improved the performance of junior radiologists (AUC = 0.80, P < .001; C index = 0.798, P < .001) and senior radiologists (AUC = 0.88, P < .001; C index = 0.869, P < .001). Conclusion This study demonstrates the potential of WISDOM as a useful LN diagnosis method using routine rectal MRI data. The improved radiologist performance observed with model assistance highlights the potential clinical utility of WISDOM in practice. Keywords: MR Imaging, Abdomen/GI, Rectum, Computer Applications-Detection/Diagnosis Supplemental material is available for this article. Published under a CC BY 4.0 license.
{"title":"Multicenter Evaluation of a Weakly Supervised Deep Learning Model for Lymph Node Diagnosis in Rectal Cancer at MRI.","authors":"Wei Xia, Dandan Li, Wenguang He, Perry J Pickhardt, Junming Jian, Rui Zhang, Junjie Zhang, Ruirui Song, Tong Tong, Xiaotang Yang, Xin Gao, Yanfen Cui","doi":"10.1148/ryai.230152","DOIUrl":"10.1148/ryai.230152","url":null,"abstract":"<p><p>Purpose To develop a Weakly supervISed model DevelOpment fraMework (WISDOM) model to construct a lymph node (LN) diagnosis model for patients with rectal cancer (RC) that uses preoperative MRI data coupled with postoperative patient-level pathologic information. Materials and Methods In this retrospective study, the WISDOM model was built using MRI (T2-weighted and diffusion-weighted imaging) and patient-level pathologic information (the number of postoperatively confirmed metastatic LNs and resected LNs) based on the data of patients with RC between January 2016 and November 2017. The incremental value of the model in assisting radiologists was investigated. The performances in binary and ternary N staging were evaluated using area under the receiver operating characteristic curve (AUC) and the concordance index (C index), respectively. Results A total of 1014 patients (median age, 62 years; IQR, 54-68 years; 590 male) were analyzed, including the training cohort (<i>n</i> = 589) and internal test cohort (<i>n</i> = 146) from center 1 and two external test cohorts (cohort 1: 117; cohort 2: 162) from centers 2 and 3. The WISDOM model yielded an overall AUC of 0.81 and C index of 0.765, significantly outperforming junior radiologists (AUC = 0.69, <i>P</i> < .001; C index = 0.689, <i>P</i> < .001) and performing comparably with senior radiologists (AUC = 0.79, <i>P</i> = .21; C index = 0.788, <i>P</i> = .22). Moreover, the model significantly improved the performance of junior radiologists (AUC = 0.80, <i>P</i> < .001; C index = 0.798, <i>P</i> < .001) and senior radiologists (AUC = 0.88, <i>P</i> < .001; C index = 0.869, <i>P</i> < .001). Conclusion This study demonstrates the potential of WISDOM as a useful LN diagnosis method using routine rectal MRI data. The improved radiologist performance observed with model assistance highlights the potential clinical utility of WISDOM in practice. <b>Keywords:</b> MR Imaging, Abdomen/GI, Rectum, Computer Applications-Detection/Diagnosis <i>Supplemental material is available for this article</i>. Published under a CC BY 4.0 license.</p>","PeriodicalId":29787,"journal":{"name":"Radiology-Artificial Intelligence","volume":" ","pages":"e230152"},"PeriodicalIF":9.8,"publicationDate":"2024-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10982819/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139730558","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Olivia Prior, Carlos Macarro, Víctor Navarro, Camilo Monreal, Marta Ligero, Alonso Garcia-Ruiz, Garazi Serna, Sara Simonetti, Irene Braña, Maria Vieito, Manuel Escobar, Jaume Capdevila, Annette T Byrne, Rodrigo Dienstmann, Rodrigo Toledo, Paolo Nuciforo, Elena Garralda, Francesco Grussu, Kinga Bernatowicz, Raquel Perez-Lopez
{"title":"Artificial Intelligence in Radiology: Bridging Global Health Care Gaps through Innovation and Inclusion.","authors":"Arkadiusz Sitek","doi":"10.1148/ryai.240093","DOIUrl":"10.1148/ryai.240093","url":null,"abstract":"","PeriodicalId":29787,"journal":{"name":"Radiology-Artificial Intelligence","volume":"6 2","pages":"e240093"},"PeriodicalIF":9.8,"publicationDate":"2024-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10982909/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140111551","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Can AI Predict the Need for Surgery in Traumatic Brain Injury?","authors":"Sven Haller","doi":"10.1148/ryai.230587","DOIUrl":"10.1148/ryai.230587","url":null,"abstract":"","PeriodicalId":29787,"journal":{"name":"Radiology-Artificial Intelligence","volume":"6 2","pages":"e230587"},"PeriodicalIF":9.8,"publicationDate":"2024-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10982907/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139730559","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}