通过大语言模型辅助分析加强肿瘤监测:GPT-4和Gemini在评估一系列腹部CT扫描报告中肿瘤问题的比较研究。

IF 3.8 2区 医学 Q1 RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING Academic Radiology Pub Date : 2024-12-09 DOI:10.1016/j.acra.2024.10.050
Na Yeon Han, Keewon Shin, Min Ju Kim, Beom Jin Park, Ki Choon Sim, Yeo Eun Han, Deuk Jae Sung, Jae Woong Choi, Suk Keu Yeom
{"title":"通过大语言模型辅助分析加强肿瘤监测:GPT-4和Gemini在评估一系列腹部CT扫描报告中肿瘤问题的比较研究。","authors":"Na Yeon Han, Keewon Shin, Min Ju Kim, Beom Jin Park, Ki Choon Sim, Yeo Eun Han, Deuk Jae Sung, Jae Woong Choi, Suk Keu Yeom","doi":"10.1016/j.acra.2024.10.050","DOIUrl":null,"url":null,"abstract":"<p><strong>Rationale and objectives: </strong>We aimed to compare the capabilities of two leading large language models (LLMs), GPT-4 and Gemini, in analyzing serial radiology reports, to highlight oncological issues that require further clinical attention.</p><p><strong>Materials and methods: </strong>This study included 205 patients, each with two consecutive radiological reports. We designed a prompt comprising a three-step task to analyze report findings using LLMs. To establish a ground truth, two radiologists reached a consensus on a six-level categorization, comprising tumor findings (categorized as improved, stable, or aggravated), \"benign\", \"no tumor description,\" and \"other malignancy.\" The performance of GPT-4 and Gemini was then compared based on their ability to match corresponding findings between two radiological reports and accurately reflect these categories.</p><p><strong>Results: </strong>In terms of accuracy in matching findings between serial reports, the proportion of correctly matched findings was significantly higher for GPT-4 (96.2%) than for Gemini (91.7%) (P < 0.01). For oncological issue identification, the precision for tumor-related finding determinations, recall, and F1-scores were 0.68 and 0.63 (P = 0.006), 0.91 and 0.80 (P < 0.001), and 0.78 and 0.70 for GPT-4 and Gemini, respectively. GPT-4 was more accurate than Gemini in determining the correct tumor status for tumor-related findings (P < 0.001).</p><p><strong>Conclusion: </strong>This study demonstrated the potential of LLM-assisted analysis of serial radiology reports in enhancing oncological surveillance, using a carefully engineered prompt. GPT-4 showed superior performance compared to Gemini in matching corresponding findings, identifying tumor-related findings, and accurately determining tumor status.</p>","PeriodicalId":50928,"journal":{"name":"Academic Radiology","volume":" ","pages":""},"PeriodicalIF":3.8000,"publicationDate":"2024-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Enhancing Oncological Surveillance Through Large Language Model-Assisted Analysis: A Comparative Study of GPT-4 and Gemini in Evaluating Oncological Issues From Serial Abdominal CT Scan Reports.\",\"authors\":\"Na Yeon Han, Keewon Shin, Min Ju Kim, Beom Jin Park, Ki Choon Sim, Yeo Eun Han, Deuk Jae Sung, Jae Woong Choi, Suk Keu Yeom\",\"doi\":\"10.1016/j.acra.2024.10.050\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Rationale and objectives: </strong>We aimed to compare the capabilities of two leading large language models (LLMs), GPT-4 and Gemini, in analyzing serial radiology reports, to highlight oncological issues that require further clinical attention.</p><p><strong>Materials and methods: </strong>This study included 205 patients, each with two consecutive radiological reports. We designed a prompt comprising a three-step task to analyze report findings using LLMs. To establish a ground truth, two radiologists reached a consensus on a six-level categorization, comprising tumor findings (categorized as improved, stable, or aggravated), \\\"benign\\\", \\\"no tumor description,\\\" and \\\"other malignancy.\\\" The performance of GPT-4 and Gemini was then compared based on their ability to match corresponding findings between two radiological reports and accurately reflect these categories.</p><p><strong>Results: </strong>In terms of accuracy in matching findings between serial reports, the proportion of correctly matched findings was significantly higher for GPT-4 (96.2%) than for Gemini (91.7%) (P < 0.01). For oncological issue identification, the precision for tumor-related finding determinations, recall, and F1-scores were 0.68 and 0.63 (P = 0.006), 0.91 and 0.80 (P < 0.001), and 0.78 and 0.70 for GPT-4 and Gemini, respectively. GPT-4 was more accurate than Gemini in determining the correct tumor status for tumor-related findings (P < 0.001).</p><p><strong>Conclusion: </strong>This study demonstrated the potential of LLM-assisted analysis of serial radiology reports in enhancing oncological surveillance, using a carefully engineered prompt. GPT-4 showed superior performance compared to Gemini in matching corresponding findings, identifying tumor-related findings, and accurately determining tumor status.</p>\",\"PeriodicalId\":50928,\"journal\":{\"name\":\"Academic Radiology\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":3.8000,\"publicationDate\":\"2024-12-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Academic Radiology\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1016/j.acra.2024.10.050\",\"RegionNum\":2,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Academic Radiology","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1016/j.acra.2024.10.050","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING","Score":null,"Total":0}
引用次数: 0

摘要

基本原理和目的:我们旨在比较两种领先的大型语言模型(llm) GPT-4和Gemini在分析系列放射学报告方面的能力,以突出需要进一步临床关注的肿瘤问题。材料和方法:本研究纳入205例患者,每位患者均有两次连续的放射学报告。我们设计了一个由三步任务组成的提示,使用llm分析报告发现。为了建立一个基本的事实,两位放射科医生就六个级别的分类达成了共识,包括肿瘤发现(分类为改善、稳定或恶化)、“良性”、“无肿瘤描述”和“其他恶性”。然后比较GPT-4和Gemini的表现,基于它们在两份放射报告之间匹配相应发现的能力,并准确反映这些类别。结果:在系列报告之间匹配结果的准确性方面,GPT-4的正确匹配比例(96.2%)明显高于Gemini (91.7%) (P结论:本研究证明了llm辅助分析系列放射学报告在加强肿瘤监测方面的潜力,使用精心设计的提示。与Gemini相比,GPT-4在匹配相应发现、识别肿瘤相关发现和准确判断肿瘤状态方面表现优越。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Enhancing Oncological Surveillance Through Large Language Model-Assisted Analysis: A Comparative Study of GPT-4 and Gemini in Evaluating Oncological Issues From Serial Abdominal CT Scan Reports.

Rationale and objectives: We aimed to compare the capabilities of two leading large language models (LLMs), GPT-4 and Gemini, in analyzing serial radiology reports, to highlight oncological issues that require further clinical attention.

Materials and methods: This study included 205 patients, each with two consecutive radiological reports. We designed a prompt comprising a three-step task to analyze report findings using LLMs. To establish a ground truth, two radiologists reached a consensus on a six-level categorization, comprising tumor findings (categorized as improved, stable, or aggravated), "benign", "no tumor description," and "other malignancy." The performance of GPT-4 and Gemini was then compared based on their ability to match corresponding findings between two radiological reports and accurately reflect these categories.

Results: In terms of accuracy in matching findings between serial reports, the proportion of correctly matched findings was significantly higher for GPT-4 (96.2%) than for Gemini (91.7%) (P < 0.01). For oncological issue identification, the precision for tumor-related finding determinations, recall, and F1-scores were 0.68 and 0.63 (P = 0.006), 0.91 and 0.80 (P < 0.001), and 0.78 and 0.70 for GPT-4 and Gemini, respectively. GPT-4 was more accurate than Gemini in determining the correct tumor status for tumor-related findings (P < 0.001).

Conclusion: This study demonstrated the potential of LLM-assisted analysis of serial radiology reports in enhancing oncological surveillance, using a carefully engineered prompt. GPT-4 showed superior performance compared to Gemini in matching corresponding findings, identifying tumor-related findings, and accurately determining tumor status.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Academic Radiology
Academic Radiology 医学-核医学
CiteScore
7.60
自引率
10.40%
发文量
432
审稿时长
18 days
期刊介绍: Academic Radiology publishes original reports of clinical and laboratory investigations in diagnostic imaging, the diagnostic use of radioactive isotopes, computed tomography, positron emission tomography, magnetic resonance imaging, ultrasound, digital subtraction angiography, image-guided interventions and related techniques. It also includes brief technical reports describing original observations, techniques, and instrumental developments; state-of-the-art reports on clinical issues, new technology and other topics of current medical importance; meta-analyses; scientific studies and opinions on radiologic education; and letters to the Editor.
期刊最新文献
Machine Learning Model for Risk Stratification of Papillary Thyroid Carcinoma Based on Radiopathomics. Non-invasive Assessment of Human Epidermal Growth Factor Receptor 2 Expression in Gastric Cancer Based on Deep Learning: A Computed Tomography-based Multicenter Study. Prediction of Radiation Therapy Induced Cardiovascular Toxicity from Pretreatment CT Images in Patients with Thoracic Malignancy via an Optimal Biomarker Approach. Unlocking Innovation: Promoting Scholarly Endeavors During Radiology Residency. Longitudinal Assessment of Pulmonary Involvement and Prognosis in Different Subtypes of COVID-19 Patients After One Year Using Low-Dose CT: A Prospective Observational Study.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1