Multilingual feasibility of GPT-4o for automated Voice-to-Text CT and MRI report transcription

IF 3.3 3区医学 Q1 RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING European Journal of Radiology Pub Date : 2025-01-01 Epub Date: 2024-11-17 DOI:10.1016/j.ejrad.2024.111827

Felix Busch , Philipp Prucker , Alexander Komenda , Sebastian Ziegelmayer , Marcus R Makowski , Keno K Bressem , Lisa C Adams

{"title":"Multilingual feasibility of GPT-4o for automated Voice-to-Text CT and MRI report transcription","authors":"Felix Busch , Philipp Prucker , Alexander Komenda , Sebastian Ziegelmayer , Marcus R Makowski , Keno K Bressem , Lisa C Adams","doi":"10.1016/j.ejrad.2024.111827","DOIUrl":null,"url":null,"abstract":"<div><h3>Purpose</h3><div>Large language models (LLMs) promise to streamline radiology reporting. With the release of OpenAI’s GPT-4o (Generative Pre-trained Transformers-4 omni), which processes not only text but also speech, multimodal LLMs might now also be used as medical speech recognition software for radiology reporting in multiple languages. This proof-of-concept study investigates the feasibility of using GPT-4o for automated voice-to-text transcription of radiology reports in English and German.</div></div><div><h3>Methods</h3><div>Three readers with varying levels of experience each dictated 100 synthetic radiology reports in both languages using GPT-4o via the ChatGPT iOS mobile application. Reports included CT and MRI scans of various anatomical regions. Evaluation metrics included error type, severity, and correction time. BERTScore and ROUGE metrics were calculated to assess semantic similarity and n-gram overlap between dictated and original reports.</div></div><div><h3>Results</h3><div>No significant differences in correction time between languages were found, but differences were observed between readers based on experience. Error rates were similar for both languages, with most errors being minor (92.68 %, n = 114/123 German; 94.74 %, n = 90/95 English) and technical (27.04 %, n = 43/159 German; 35.65 %, n = 41/115 English) or typographical (23.9 %, n = 38/159 German; 27.83 %, n = 32/115 English). BERTScore metrics were significantly higher for German, while ROUGE metrics showed no significant differences between languages.</div></div><div><h3>Conclusion</h3><div>This study demonstrates the potential of GPT-4o for multilingual transcription of radiology reports, effectively handling both English and German with minimal errors and high semantic understanding. Future research should compare GPT-4o with current radiology dictation tools, assessing performance, cost-effectiveness, and multilingual capabilities across diverse speaker populations.</div></div>","PeriodicalId":12063,"journal":{"name":"European Journal of Radiology","volume":"182 ","pages":"Article 111827"},"PeriodicalIF":3.3000,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"European Journal of Radiology","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0720048X24005436","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/11/17 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING","Score":null,"Total":0}

引用次数: 0

Abstract

Purpose

Large language models (LLMs) promise to streamline radiology reporting. With the release of OpenAI’s GPT-4o (Generative Pre-trained Transformers-4 omni), which processes not only text but also speech, multimodal LLMs might now also be used as medical speech recognition software for radiology reporting in multiple languages. This proof-of-concept study investigates the feasibility of using GPT-4o for automated voice-to-text transcription of radiology reports in English and German.

Methods

Three readers with varying levels of experience each dictated 100 synthetic radiology reports in both languages using GPT-4o via the ChatGPT iOS mobile application. Reports included CT and MRI scans of various anatomical regions. Evaluation metrics included error type, severity, and correction time. BERTScore and ROUGE metrics were calculated to assess semantic similarity and n-gram overlap between dictated and original reports.

Results

No significant differences in correction time between languages were found, but differences were observed between readers based on experience. Error rates were similar for both languages, with most errors being minor (92.68 %, n = 114/123 German; 94.74 %, n = 90/95 English) and technical (27.04 %, n = 43/159 German; 35.65 %, n = 41/115 English) or typographical (23.9 %, n = 38/159 German; 27.83 %, n = 32/115 English). BERTScore metrics were significantly higher for German, while ROUGE metrics showed no significant differences between languages.

Conclusion

This study demonstrates the potential of GPT-4o for multilingual transcription of radiology reports, effectively handling both English and German with minimal errors and high semantic understanding. Future research should compare GPT-4o with current radiology dictation tools, assessing performance, cost-effectiveness, and multilingual capabilities across diverse speaker populations.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

GPT-4o 用于自动语音转文本 CT 和 MRI 报告转录的多语言可行性。

目的：大型语言模型（LLM）有望简化放射学报告。OpenAI 的 GPT-4o（Generative Pre-trained Transformers-4 omni）不仅能处理文本，还能处理语音，随着它的发布，多模态 LLM 现在也可用作医学语音识别软件，以多种语言处理放射学报告。这项概念验证研究调查了使用 GPT-4o 自动将英语和德语的放射学报告语音转为文本的可行性：方法：三名具有不同经验水平的读者通过 ChatGPT iOS 移动应用程序使用 GPT-4o 分别口述了 100 份两种语言的合成放射学报告。报告包括不同解剖区域的 CT 和 MRI 扫描。评估指标包括错误类型、严重程度和纠正时间。通过计算 BERTScore 和 ROUGE 指标来评估听写报告和原始报告之间的语义相似性和 n-gram 重叠：结果：没有发现不同语言在校正时间上有明显差异，但根据经验观察到不同读者之间存在差异。两种语言的错误率相似，大多数错误为轻微错误（92.68%，n=114/123 德语；94.74%，n=90/95 英语）、技术错误（27.04%，n=43/159 德语；35.65%，n=41/115 英语）或排版错误（23.9%，n=38/159 德语；27.83%，n=32/115 英语）。德语的 BERTScore 指标明显更高，而 ROUGE 指标在不同语言之间没有明显差异：本研究证明了 GPT-4o 在放射学报告多语言转录方面的潜力，它能有效处理英语和德语，错误极少，语义理解能力强。未来的研究应将 GPT-4o 与当前的放射学听写工具进行比较，评估其性能、成本效益以及在不同说话人群中的多语言能力。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

European Journal of Radiology 医学-核医学

CiteScore

6.70

自引率

3.00%

发文量

398

审稿时长

42 days

期刊介绍： European Journal of Radiology is an international journal which aims to communicate to its readers, state-of-the-art information on imaging developments in the form of high quality original research articles and timely reviews on current developments in the field. Its audience includes clinicians at all levels of training including radiology trainees, newly qualified imaging specialists and the experienced radiologist. Its aim is to inform efficient, appropriate and evidence-based imaging practice to the benefit of patients worldwide.