Enhancing Oncological Surveillance Through Large Language Model-Assisted Analysis: A Comparative Study of GPT-4 and Gemini in Evaluating Oncological Issues From Serial Abdominal CT Scan Reports

IF 3.9 2区医学 Q1 RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING Academic Radiology Pub Date : 2024-12-09 DOI:10.1016/j.acra.2024.10.050

Na Yeon Han , Keewon Shin , Min Ju Kim MD, PhD , Beom Jin Park , Ki Choon Sim , Yeo Eun Han , Deuk Jae Sung , Jae Woong Choi , Suk Keu Yeom

{"title":"Enhancing Oncological Surveillance Through Large Language Model-Assisted Analysis: A Comparative Study of GPT-4 and Gemini in Evaluating Oncological Issues From Serial Abdominal CT Scan Reports","authors":"Na Yeon Han , Keewon Shin , Min Ju Kim MD, PhD , Beom Jin Park , Ki Choon Sim , Yeo Eun Han , Deuk Jae Sung , Jae Woong Choi , Suk Keu Yeom","doi":"10.1016/j.acra.2024.10.050","DOIUrl":null,"url":null,"abstract":"<div><h3>Rationale and Objectives</h3><div>We aimed to compare the capabilities of two leading large language models (LLMs), GPT-4 and Gemini, in analyzing serial radiology reports, to highlight oncological issues that require further clinical attention.</div></div><div><h3>Materials and Methods</h3><div>This study included 205 patients, each with two consecutive radiological reports. We designed a prompt comprising a three-step task to analyze report findings using LLMs. To establish a ground truth, two radiologists reached a consensus on a six-level categorization, comprising tumor findings (categorized as improved, stable, or aggravated), “benign”, “no tumor description,” and “other malignancy.” The performance of GPT-4 and Gemini was then compared based on their ability to match corresponding findings between two radiological reports and accurately reflect these categories.</div></div><div><h3>Results</h3><div>In terms of accuracy in matching findings between serial reports, the proportion of correctly matched findings was significantly higher for GPT-4 (96.2%) than for Gemini (91.7%) (P < 0.01). For oncological issue identification, the precision for tumor-related finding determinations, recall, and F1-scores were 0.68 and 0.63 (P = 0.006), 0.91 and 0.80 (P < 0.001), and 0.78 and 0.70 for GPT-4 and Gemini, respectively. GPT-4 was more accurate than Gemini in determining the correct tumor status for tumor-related findings (P < 0.001).</div></div><div><h3>Conclusion</h3><div>This study demonstrated the potential of LLM-assisted analysis of serial radiology reports in enhancing oncological surveillance, using a carefully engineered prompt. GPT-4 showed superior performance compared to Gemini in matching corresponding findings, identifying tumor-related findings, and accurately determining tumor status.</div></div>","PeriodicalId":50928,"journal":{"name":"Academic Radiology","volume":"32 5","pages":"Pages 2385-2391"},"PeriodicalIF":3.9000,"publicationDate":"2024-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Academic Radiology","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1076633224008377","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING","Score":null,"Total":0}

引用次数: 0

Abstract

Rationale and Objectives

We aimed to compare the capabilities of two leading large language models (LLMs), GPT-4 and Gemini, in analyzing serial radiology reports, to highlight oncological issues that require further clinical attention.

Materials and Methods

This study included 205 patients, each with two consecutive radiological reports. We designed a prompt comprising a three-step task to analyze report findings using LLMs. To establish a ground truth, two radiologists reached a consensus on a six-level categorization, comprising tumor findings (categorized as improved, stable, or aggravated), “benign”, “no tumor description,” and “other malignancy.” The performance of GPT-4 and Gemini was then compared based on their ability to match corresponding findings between two radiological reports and accurately reflect these categories.

Results

In terms of accuracy in matching findings between serial reports, the proportion of correctly matched findings was significantly higher for GPT-4 (96.2%) than for Gemini (91.7%) (P < 0.01). For oncological issue identification, the precision for tumor-related finding determinations, recall, and F1-scores were 0.68 and 0.63 (P = 0.006), 0.91 and 0.80 (P < 0.001), and 0.78 and 0.70 for GPT-4 and Gemini, respectively. GPT-4 was more accurate than Gemini in determining the correct tumor status for tumor-related findings (P < 0.001).

Conclusion

This study demonstrated the potential of LLM-assisted analysis of serial radiology reports in enhancing oncological surveillance, using a carefully engineered prompt. GPT-4 showed superior performance compared to Gemini in matching corresponding findings, identifying tumor-related findings, and accurately determining tumor status.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

通过大语言模型辅助分析加强肿瘤监测：GPT-4和Gemini在评估一系列腹部CT扫描报告中肿瘤问题的比较研究。

基本原理和目的：我们旨在比较两种领先的大型语言模型（llm） GPT-4和Gemini在分析系列放射学报告方面的能力，以突出需要进一步临床关注的肿瘤问题。材料和方法：本研究纳入205例患者，每位患者均有两次连续的放射学报告。我们设计了一个由三步任务组成的提示，使用llm分析报告发现。为了建立一个基本的事实，两位放射科医生就六个级别的分类达成了共识，包括肿瘤发现（分类为改善、稳定或恶化）、“良性”、“无肿瘤描述”和“其他恶性”。然后比较GPT-4和Gemini的表现，基于它们在两份放射报告之间匹配相应发现的能力，并准确反映这些类别。结果：在系列报告之间匹配结果的准确性方面，GPT-4的正确匹配比例（96.2%）明显高于Gemini (91.7%) (P结论：本研究证明了llm辅助分析系列放射学报告在加强肿瘤监测方面的潜力，使用精心设计的提示。与Gemini相比，GPT-4在匹配相应发现、识别肿瘤相关发现和准确判断肿瘤状态方面表现优越。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Academic Radiology 医学-核医学

CiteScore

7.60

自引率

10.40%

发文量

432

审稿时长

18 days

期刊介绍： Academic Radiology publishes original reports of clinical and laboratory investigations in diagnostic imaging, the diagnostic use of radioactive isotopes, computed tomography, positron emission tomography, magnetic resonance imaging, ultrasound, digital subtraction angiography, image-guided interventions and related techniques. It also includes brief technical reports describing original observations, techniques, and instrumental developments; state-of-the-art reports on clinical issues, new technology and other topics of current medical importance; meta-analyses; scientific studies and opinions on radiologic education; and letters to the Editor.