Felix Busch, Philipp Prucker, Alexander Komenda, Sebastian Ziegelmayer, Marcus R Makowski, Keno K Bressem, Lisa C Adams
{"title":"Multilingual feasibility of GPT-4o for automated Voice-to-Text CT and MRI report transcription.","authors":"Felix Busch, Philipp Prucker, Alexander Komenda, Sebastian Ziegelmayer, Marcus R Makowski, Keno K Bressem, Lisa C Adams","doi":"10.1016/j.ejrad.2024.111827","DOIUrl":null,"url":null,"abstract":"<p><strong>Purpose: </strong>Large language models (LLMs) promise to streamline radiology reporting. With the release of OpenAI's GPT-4o (Generative Pre-trained Transformers-4 omni), which processes not only text but also speech, multimodal LLMs might now also be used as medical speech recognition software for radiology reporting in multiple languages. This proof-of-concept study investigates the feasibility of using GPT-4o for automated voice-to-text transcription of radiology reports in English and German.</p><p><strong>Methods: </strong>Three readers with varying levels of experience each dictated 100 synthetic radiology reports in both languages using GPT-4o via the ChatGPT iOS mobile application. Reports included CT and MRI scans of various anatomical regions. Evaluation metrics included error type, severity, and correction time. BERTScore and ROUGE metrics were calculated to assess semantic similarity and n-gram overlap between dictated and original reports.</p><p><strong>Results: </strong>No significant differences in correction time between languages were found, but differences were observed between readers based on experience. Error rates were similar for both languages, with most errors being minor (92.68 %, n = 114/123 German; 94.74 %, n = 90/95 English) and technical (27.04 %, n = 43/159 German; 35.65 %, n = 41/115 English) or typographical (23.9 %, n = 38/159 German; 27.83 %, n = 32/115 English). BERTScore metrics were significantly higher for German, while ROUGE metrics showed no significant differences between languages.</p><p><strong>Conclusion: </strong>This study demonstrates the potential of GPT-4o for multilingual transcription of radiology reports, effectively handling both English and German with minimal errors and high semantic understanding. Future research should compare GPT-4o with current radiology dictation tools, assessing performance, cost-effectiveness, and multilingual capabilities across diverse speaker populations.</p>","PeriodicalId":12063,"journal":{"name":"European Journal of Radiology","volume":"182 ","pages":"111827"},"PeriodicalIF":3.2000,"publicationDate":"2024-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"European Journal of Radiology","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1016/j.ejrad.2024.111827","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING","Score":null,"Total":0}
引用次数: 0
Abstract
Purpose: Large language models (LLMs) promise to streamline radiology reporting. With the release of OpenAI's GPT-4o (Generative Pre-trained Transformers-4 omni), which processes not only text but also speech, multimodal LLMs might now also be used as medical speech recognition software for radiology reporting in multiple languages. This proof-of-concept study investigates the feasibility of using GPT-4o for automated voice-to-text transcription of radiology reports in English and German.
Methods: Three readers with varying levels of experience each dictated 100 synthetic radiology reports in both languages using GPT-4o via the ChatGPT iOS mobile application. Reports included CT and MRI scans of various anatomical regions. Evaluation metrics included error type, severity, and correction time. BERTScore and ROUGE metrics were calculated to assess semantic similarity and n-gram overlap between dictated and original reports.
Results: No significant differences in correction time between languages were found, but differences were observed between readers based on experience. Error rates were similar for both languages, with most errors being minor (92.68 %, n = 114/123 German; 94.74 %, n = 90/95 English) and technical (27.04 %, n = 43/159 German; 35.65 %, n = 41/115 English) or typographical (23.9 %, n = 38/159 German; 27.83 %, n = 32/115 English). BERTScore metrics were significantly higher for German, while ROUGE metrics showed no significant differences between languages.
Conclusion: This study demonstrates the potential of GPT-4o for multilingual transcription of radiology reports, effectively handling both English and German with minimal errors and high semantic understanding. Future research should compare GPT-4o with current radiology dictation tools, assessing performance, cost-effectiveness, and multilingual capabilities across diverse speaker populations.
期刊介绍:
European Journal of Radiology is an international journal which aims to communicate to its readers, state-of-the-art information on imaging developments in the form of high quality original research articles and timely reviews on current developments in the field.
Its audience includes clinicians at all levels of training including radiology trainees, newly qualified imaging specialists and the experienced radiologist. Its aim is to inform efficient, appropriate and evidence-based imaging practice to the benefit of patients worldwide.