Evaluation of radiology residents' reporting skills using large language models: an observational study.

IF 2.1 4区 医学 Japanese Journal of Radiology Pub Date : 2025-07-01 Epub Date: 2025-03-08 DOI:10.1007/s11604-025-01764-y
Natsuko Atsukawa, Hiroyuki Tatekawa, Tatsushi Oura, Shu Matsushita, Daisuke Horiuchi, Hirotaka Takita, Yasuhito Mitsuyama, Ayako Omori, Taro Shimono, Yukio Miki, Daiju Ueda
{"title":"Evaluation of radiology residents' reporting skills using large language models: an observational study.","authors":"Natsuko Atsukawa, Hiroyuki Tatekawa, Tatsushi Oura, Shu Matsushita, Daisuke Horiuchi, Hirotaka Takita, Yasuhito Mitsuyama, Ayako Omori, Taro Shimono, Yukio Miki, Daiju Ueda","doi":"10.1007/s11604-025-01764-y","DOIUrl":null,"url":null,"abstract":"<p><strong>Purpose: </strong>Large language models (LLMs) have the potential to objectively evaluate radiology resident reports; however, research on their use for feedback in radiology training and assessment of resident skill development remains limited. This study aimed to assess the effectiveness of LLMs in revising radiology reports by comparing them with reports verified by board-certified radiologists and to analyze the progression of resident's reporting skills over time.</p><p><strong>Materials and methods: </strong>To identify the LLM that best aligned with human radiologists, 100 reports were randomly selected from 7376 reports authored by nine first-year radiology residents. The reports were evaluated based on six criteria: (1) addition of missing positive findings, (2) deletion of findings, (3) addition of negative findings, (4) correction of the expression of findings, (5) correction of the diagnosis, and (6) proposal of additional examinations or treatments. Reports were segmented into four time-based terms, and 900 reports (450 CT and 450 MRI) were randomly chosen from the initial and final terms of the residents' first year. The revised rates for each criterion were compared between the first and last terms using the Wilcoxon Signed-Rank test.</p><p><strong>Results: </strong>Among the three LLMs-ChatGPT-4 Omni (GPT-4o), Claude-3.5 Sonnet, and Claude-3 Opus-GPT-4o demonstrated the highest level of agreement with board-certified radiologists. Significant improvements were noted in Criteria 1-3 when comparing reports from the first and last terms (Criteria 1, 2, and 3; P < 0.001, P = 0.023, and P = 0.004, respectively) using GPT-4o. No significant changes were observed for Criteria 4-6. Despite this, all criteria except for Criteria 6 showed progressive enhancement over time.</p><p><strong>Conclusion: </strong>LLMs can effectively provide feedback on commonly corrected areas in radiology reports, enabling residents to objectively identify and improve their weaknesses and monitor their progress. Additionally, LLMs may help reduce the workload of radiologists' mentors.</p>","PeriodicalId":14691,"journal":{"name":"Japanese Journal of Radiology","volume":" ","pages":"1204-1212"},"PeriodicalIF":2.1000,"publicationDate":"2025-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12204868/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Japanese Journal of Radiology","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1007/s11604-025-01764-y","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/3/8 0:00:00","PubModel":"Epub","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Purpose: Large language models (LLMs) have the potential to objectively evaluate radiology resident reports; however, research on their use for feedback in radiology training and assessment of resident skill development remains limited. This study aimed to assess the effectiveness of LLMs in revising radiology reports by comparing them with reports verified by board-certified radiologists and to analyze the progression of resident's reporting skills over time.

Materials and methods: To identify the LLM that best aligned with human radiologists, 100 reports were randomly selected from 7376 reports authored by nine first-year radiology residents. The reports were evaluated based on six criteria: (1) addition of missing positive findings, (2) deletion of findings, (3) addition of negative findings, (4) correction of the expression of findings, (5) correction of the diagnosis, and (6) proposal of additional examinations or treatments. Reports were segmented into four time-based terms, and 900 reports (450 CT and 450 MRI) were randomly chosen from the initial and final terms of the residents' first year. The revised rates for each criterion were compared between the first and last terms using the Wilcoxon Signed-Rank test.

Results: Among the three LLMs-ChatGPT-4 Omni (GPT-4o), Claude-3.5 Sonnet, and Claude-3 Opus-GPT-4o demonstrated the highest level of agreement with board-certified radiologists. Significant improvements were noted in Criteria 1-3 when comparing reports from the first and last terms (Criteria 1, 2, and 3; P < 0.001, P = 0.023, and P = 0.004, respectively) using GPT-4o. No significant changes were observed for Criteria 4-6. Despite this, all criteria except for Criteria 6 showed progressive enhancement over time.

Conclusion: LLMs can effectively provide feedback on commonly corrected areas in radiology reports, enabling residents to objectively identify and improve their weaknesses and monitor their progress. Additionally, LLMs may help reduce the workload of radiologists' mentors.

Abstract Image

Abstract Image

Abstract Image

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
使用大语言模型评估放射科住院医师报告技能:一项观察性研究。
目的:大型语言模型(LLMs)具有客观评价放射学住院医师报告的潜力;然而,关于它们在放射学培训和住院医师技能发展评估中的反馈应用的研究仍然有限。本研究旨在评估法学硕士在修订放射学报告方面的有效性,将其与经委员会认证的放射科医师验证的报告进行比较,并分析住院医师报告技能随时间的进展。材料和方法:为了确定与人类放射科医师最一致的法学硕士,从9名第一年放射科住院医师撰写的7376份报告中随机选择100份报告。这些报告是根据六个标准进行评估的:(1)添加缺失的阳性发现,(2)删除发现,(3)添加阴性发现,(4)纠正发现的表达,(5)纠正诊断,(6)建议额外的检查或治疗。报告被分为四个基于时间的术语,900份报告(450 CT和450 MRI)从住院医生第一年的初始和最终术语中随机选择。使用Wilcoxon Signed-Rank检验比较每个标准的修订率在第一项和最后一项之间。结果:在三个llms中- chatgpt -4 Omni (gpt - 40), Claude-3.5 Sonnet和Claude-3 opus - gpt - 40与委员会认证的放射科医生的一致性最高。当比较第一学期和最后学期的报告时,在标准1-3中发现了显著的改进(标准1、2和3;P结论:llm可以有效地对放射学报告中常见的纠正区域进行反馈,使住院医师能够客观地识别和改进自己的弱点,并监测自己的进步。此外,法学硕士可能有助于减少放射科医生导师的工作量。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Japanese Journal of Radiology
Japanese Journal of Radiology Medicine-Radiology, Nuclear Medicine and Imaging
自引率
4.80%
发文量
133
期刊介绍: Japanese Journal of Radiology is a peer-reviewed journal, officially published by the Japan Radiological Society. The main purpose of the journal is to provide a forum for the publication of papers documenting recent advances and new developments in the field of radiology in medicine and biology. The scope of Japanese Journal of Radiology encompasses but is not restricted to diagnostic radiology, interventional radiology, radiation oncology, nuclear medicine, radiation physics, and radiation biology. Additionally, the journal covers technical and industrial innovations. The journal welcomes original articles, technical notes, review articles, pictorial essays and letters to the editor. The journal also provides announcements from the boards and the committees of the society. Membership in the Japan Radiological Society is not a prerequisite for submission. Contributions are welcomed from all parts of the world.
期刊最新文献
18F-FAPI-74 versus 18F-FDG in synchronous pancreatic cancer and follicular lymphoma. High-resolution chest CT using 1024-matrix reconstruction: phantom and clinical evaluation of image quality and post-processing capability. One-stop evaluation using [68Ga]Ga-Pentixafor PET integrated with contrast-enhanced CT for visualization and localization of adrenal nodules in patients with primary aldosteronism. Blood oxygenation level-dependent MRI in critical limb-threatening ischemia: association with clinical and angiographic findings and response to revascularization. Current implementation and perception of palliative interventional radiology procedures for patients with refractory cancer pain among interventional radiologists in Japan: a nationwide survey.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1