Automatic structuring of radiology reports with on-premise open-source large language models.

IF 4.7 2区 医学 Q1 RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING European Radiology Pub Date : 2024-10-10 DOI:10.1007/s00330-024-11074-y
Piotr Woźnicki, Caroline Laqua, Ina Fiku, Amar Hekalo, Daniel Truhn, Sandy Engelhardt, Jakob Kather, Sebastian Foersch, Tugba Akinci D'Antonoli, Daniel Pinto Dos Santos, Bettina Baeßler, Fabian Christopher Laqua
{"title":"Automatic structuring of radiology reports with on-premise open-source large language models.","authors":"Piotr Woźnicki, Caroline Laqua, Ina Fiku, Amar Hekalo, Daniel Truhn, Sandy Engelhardt, Jakob Kather, Sebastian Foersch, Tugba Akinci D'Antonoli, Daniel Pinto Dos Santos, Bettina Baeßler, Fabian Christopher Laqua","doi":"10.1007/s00330-024-11074-y","DOIUrl":null,"url":null,"abstract":"<p><strong>Objectives: </strong>Structured reporting enhances comparability, readability, and content detail. Large language models (LLMs) could convert free text into structured data without disrupting radiologists' reporting workflow. This study evaluated an on-premise, privacy-preserving LLM for automatically structuring free-text radiology reports.</p><p><strong>Materials and methods: </strong>We developed an approach to controlling the LLM output, ensuring the validity and completeness of structured reports produced by a locally hosted Llama-2-70B-chat model. A dataset with de-identified narrative chest radiograph (CXR) reports was compiled retrospectively. It included 202 English reports from a publicly available MIMIC-CXR dataset and 197 German reports from our university hospital. Senior radiologist prepared a detailed, fully structured reporting template with 48 question-answer pairs. All reports were independently structured by the LLM and two human readers. Bayesian inference (Markov chain Monte Carlo sampling) was used to estimate the distributions of Matthews correlation coefficient (MCC), with [-0.05, 0.05] as the region of practical equivalence (ROPE).</p><p><strong>Results: </strong>The LLM generated valid structured reports in all cases, achieving an average MCC of 0.75 (94% HDI: 0.70-0.80) and F1 score of 0.70 (0.70-0.80) for English, and 0.66 (0.62-0.70) and 0.68 (0.64-0.72) for German reports, respectively. The MCC differences between LLM and humans were within ROPE for both languages: 0.01 (-0.05 to 0.07), 0.01 (-0.05 to 0.07) for English, and -0.01 (-0.07 to 0.05), 0.00 (-0.06 to 0.06) for German, indicating approximately comparable performance.</p><p><strong>Conclusion: </strong>Locally hosted, open-source LLMs can automatically structure free-text radiology reports with approximately human accuracy. However, the understanding of semantics varied across languages and imaging findings.</p><p><strong>Key points: </strong>Question Why has structured reporting not been widely adopted in radiology despite clear benefits and how can we improve this? Findings A locally hosted large language model successfully structured narrative reports, showing variation between languages and findings. Critical relevance Structured reporting provides many benefits, but its integration into the clinical routine is limited. Automating the extraction of structured information from radiology reports enables the capture of structured data while allowing the radiologist to maintain their reporting workflow.</p>","PeriodicalId":12076,"journal":{"name":"European Radiology","volume":null,"pages":null},"PeriodicalIF":4.7000,"publicationDate":"2024-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"European Radiology","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1007/s00330-024-11074-y","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING","Score":null,"Total":0}
引用次数: 0

Abstract

Objectives: Structured reporting enhances comparability, readability, and content detail. Large language models (LLMs) could convert free text into structured data without disrupting radiologists' reporting workflow. This study evaluated an on-premise, privacy-preserving LLM for automatically structuring free-text radiology reports.

Materials and methods: We developed an approach to controlling the LLM output, ensuring the validity and completeness of structured reports produced by a locally hosted Llama-2-70B-chat model. A dataset with de-identified narrative chest radiograph (CXR) reports was compiled retrospectively. It included 202 English reports from a publicly available MIMIC-CXR dataset and 197 German reports from our university hospital. Senior radiologist prepared a detailed, fully structured reporting template with 48 question-answer pairs. All reports were independently structured by the LLM and two human readers. Bayesian inference (Markov chain Monte Carlo sampling) was used to estimate the distributions of Matthews correlation coefficient (MCC), with [-0.05, 0.05] as the region of practical equivalence (ROPE).

Results: The LLM generated valid structured reports in all cases, achieving an average MCC of 0.75 (94% HDI: 0.70-0.80) and F1 score of 0.70 (0.70-0.80) for English, and 0.66 (0.62-0.70) and 0.68 (0.64-0.72) for German reports, respectively. The MCC differences between LLM and humans were within ROPE for both languages: 0.01 (-0.05 to 0.07), 0.01 (-0.05 to 0.07) for English, and -0.01 (-0.07 to 0.05), 0.00 (-0.06 to 0.06) for German, indicating approximately comparable performance.

Conclusion: Locally hosted, open-source LLMs can automatically structure free-text radiology reports with approximately human accuracy. However, the understanding of semantics varied across languages and imaging findings.

Key points: Question Why has structured reporting not been widely adopted in radiology despite clear benefits and how can we improve this? Findings A locally hosted large language model successfully structured narrative reports, showing variation between languages and findings. Critical relevance Structured reporting provides many benefits, but its integration into the clinical routine is limited. Automating the extraction of structured information from radiology reports enables the capture of structured data while allowing the radiologist to maintain their reporting workflow.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
利用内部开源大型语言模型自动构建放射学报告。
目标:结构化报告可增强可比性、可读性和内容细节。大型语言模型(LLM)可将自由文本转换为结构化数据,而不会干扰放射科医生的报告工作流程。本研究评估了一种用于自动构建自由文本放射学报告的内部隐私保护 LLM:我们开发了一种方法来控制 LLM 的输出,确保本地托管的 Llama-2-70B 聊天模型生成的结构化报告的有效性和完整性。我们回顾性地编制了一个数据集,其中包含去标识化的叙述性胸片(CXR)报告。其中包括来自公开 MIMIC-CXR 数据集的 202 份英文报告和来自我们大学医院的 197 份德文报告。资深放射科医生准备了一份详细、结构完整的报告模板,其中包含 48 对问答。所有报告均由 LLM 和两名人类读者独立进行结构化处理。贝叶斯推断法(马尔科夫链蒙特卡洛抽样)用于估计马修斯相关系数(MCC)的分布,以[-0.05, 0.05]作为实际等效区域(ROPE):LLM 在所有情况下都生成了有效的结构化报告,英语报告的平均 MCC 为 0.75(94% HDI:0.70-0.80),F1 分数为 0.70(0.70-0.80);德语报告的平均 MCC 为 0.66(0.62-0.70),F1 分数为 0.68(0.64-0.72)。LLM 与人类的 MCC 差异在两种语言的 ROPE 范围内:英语为 0.01(-0.05 至 0.07)、0.01(-0.05 至 0.07),德语为-0.01(-0.07 至 0.05)、0.00(-0.06 至 0.06),表明性能大致相当:结论:本地托管的开源 LLM 可以自动构建自由文本放射学报告,准确率接近人类水平。结论:本地托管的开放源码 LLM 可自动对自由文本放射学报告进行结构化处理,其准确度接近人类准确度。然而,不同语言和不同成像结果对语义的理解各不相同:问题 尽管结构化报告具有明显的优势,但为何没有在放射学中广泛采用?研究结果 一个本地托管的大型语言模型成功地将叙述性报告结构化,显示出不同语言和不同检查结果之间的差异。关键意义 结构化报告有很多好处,但将其融入临床常规工作却很有限。从放射学报告中自动提取结构化信息可捕获结构化数据,同时允许放射科医生保持其报告工作流程。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
European Radiology
European Radiology 医学-核医学
CiteScore
11.60
自引率
8.50%
发文量
874
审稿时长
2-4 weeks
期刊介绍: European Radiology (ER) continuously updates scientific knowledge in radiology by publication of strong original articles and state-of-the-art reviews written by leading radiologists. A well balanced combination of review articles, original papers, short communications from European radiological congresses and information on society matters makes ER an indispensable source for current information in this field. This is the Journal of the European Society of Radiology, and the official journal of a number of societies. From 2004-2008 supplements to European Radiology were published under its companion, European Radiology Supplements, ISSN 1613-3749.
期刊最新文献
Computed tomography referral guidelines adherence in Europe: insights from a seven-country audit. Automatic structuring of radiology reports with on-premise open-source large language models. Correction: Diagnostic MRI for deep pelvic endometriosis: towards a standardized protocol? ESR Essentials: role of PET/CT in neuroendocrine tumors-practice recommendations by the European Society for Hybrid, Molecular and Translational Imaging. ESR Essentials: staging and restaging with FDG-PET/CT in oncology-practice recommendations by the European Society for Hybrid, Molecular and Translational Imaging.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1