Enhanced Artificial Intelligence in Bladder Cancer Management: A Comparative Analysis and Optimization Study of Multiple Large Language Models.

IF 2.9 2区 医学 Q1 UROLOGY & NEPHROLOGY Journal of endourology Pub Date : 2025-03-18 DOI:10.1089/end.2024.0860
Kun-Peng Li, Li Wang, Shun Wan, Chen-Yang Wang, Si-Yu Chen, Shan-Hui Liu, Li Yang
{"title":"Enhanced Artificial Intelligence in Bladder Cancer Management: A Comparative Analysis and Optimization Study of Multiple Large Language Models.","authors":"Kun-Peng Li, Li Wang, Shun Wan, Chen-Yang Wang, Si-Yu Chen, Shan-Hui Liu, Li Yang","doi":"10.1089/end.2024.0860","DOIUrl":null,"url":null,"abstract":"<p><p><b><i>Background:</i></b> With the rapid advancement of artificial intelligence in health care, large language models (LLMs) demonstrate increasing potential in medical applications. However, their performance in specialized oncology remains limited. This study evaluates the performance of multiple leading LLMs in addressing clinical inquiries related to bladder cancer (BLCA) and demonstrates how strategic optimization can overcome these limitations. <b><i>Methods:</i></b> We developed a comprehensive set of 100 clinical questions based on established guidelines. These questions encompassed epidemiology, diagnosis, treatment, prognosis, and follow-up aspects of BLCA management. Six LLMs (Claude-3.5-Sonnet, ChatGPT-4.0, Grok-beta, Gemini-1.5-Pro, Mistral-Large-2, and GPT-3.5-Turbo) were tested through three independent trials. The responses were validated against current clinical guidelines and expert consensus. We implemented a two-phase training optimization process specifically for GPT-3.5-Turbo to enhance its performance. <b><i>Results:</i></b> In the initial evaluation, Claude-3.5-Sonnet demonstrated the highest accuracy (89.33% ± 1.53%), followed by ChatGPT-4 (85.67% ± 1.15%). Grok-beta achieved 84.33% ± 1.53% accuracy, whereas Gemini-1.5-Pro and Mistral-Large-2 showed similar performance (82.00% ± 1.00% and 81.00% ± 1.00%, respectively). GPT-3.5-Turbo demonstrated the lowest accuracy (74.33% ± 3.06%). After the first phase of training, GPT-3.5-Turbo's accuracy improved to 86.67% ± 1.89%. Following the second phase of optimization, the model achieved 100% accuracy. <b><i>Conclusion:</i></b> This study not only establishes the comparative performance of various LLMs in BLCA-related queries but also validates the potential for significant improvement through targeted training optimization. The successful enhancement of GPT-3.5-Turbo's performance suggests that strategic model refinement can overcome initial limitations and achieve optimal accuracy in specialized medical applications.</p>","PeriodicalId":15723,"journal":{"name":"Journal of endourology","volume":" ","pages":""},"PeriodicalIF":2.9000,"publicationDate":"2025-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of endourology","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1089/end.2024.0860","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"UROLOGY & NEPHROLOGY","Score":null,"Total":0}
引用次数: 0

Abstract

Background: With the rapid advancement of artificial intelligence in health care, large language models (LLMs) demonstrate increasing potential in medical applications. However, their performance in specialized oncology remains limited. This study evaluates the performance of multiple leading LLMs in addressing clinical inquiries related to bladder cancer (BLCA) and demonstrates how strategic optimization can overcome these limitations. Methods: We developed a comprehensive set of 100 clinical questions based on established guidelines. These questions encompassed epidemiology, diagnosis, treatment, prognosis, and follow-up aspects of BLCA management. Six LLMs (Claude-3.5-Sonnet, ChatGPT-4.0, Grok-beta, Gemini-1.5-Pro, Mistral-Large-2, and GPT-3.5-Turbo) were tested through three independent trials. The responses were validated against current clinical guidelines and expert consensus. We implemented a two-phase training optimization process specifically for GPT-3.5-Turbo to enhance its performance. Results: In the initial evaluation, Claude-3.5-Sonnet demonstrated the highest accuracy (89.33% ± 1.53%), followed by ChatGPT-4 (85.67% ± 1.15%). Grok-beta achieved 84.33% ± 1.53% accuracy, whereas Gemini-1.5-Pro and Mistral-Large-2 showed similar performance (82.00% ± 1.00% and 81.00% ± 1.00%, respectively). GPT-3.5-Turbo demonstrated the lowest accuracy (74.33% ± 3.06%). After the first phase of training, GPT-3.5-Turbo's accuracy improved to 86.67% ± 1.89%. Following the second phase of optimization, the model achieved 100% accuracy. Conclusion: This study not only establishes the comparative performance of various LLMs in BLCA-related queries but also validates the potential for significant improvement through targeted training optimization. The successful enhancement of GPT-3.5-Turbo's performance suggests that strategic model refinement can overcome initial limitations and achieve optimal accuracy in specialized medical applications.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
背景:随着人工智能在医疗保健领域的快速发展,大型语言模型(LLMs)在医疗应用中展现出越来越大的潜力。然而,它们在专业肿瘤学领域的表现仍然有限。本研究评估了多个领先的大型语言模型在解决膀胱癌(BLCA)相关临床问题时的表现,并展示了策略优化如何克服这些局限性。方法:我们根据既定指南制定了一套包含 100 个临床问题的综合问题集。这些问题涵盖了膀胱癌管理的流行病学、诊断、治疗、预后和随访等方面。六个 LLM(Claude-3.5-Sonnet、ChatGPT-4.0、Grok-beta、Gemini-1.5-Pro、Mistral-Large-2 和 GPT-3.5-Turbo)通过三个独立试验进行了测试。测试结果与现行临床指南和专家共识进行了验证。我们专门针对 GPT-3.5-Turbo 实施了两阶段训练优化流程,以提高其性能。结果:在初步评估中,Claude-3.5-Sonnet 的准确率最高(89.33% ± 1.53%),其次是 ChatGPT-4(85.67% ± 1.15%)。Grok-beta 的准确率为 84.33% ± 1.53%,而 Gemini-1.5-Pro 和 Mistral-Large-2 的表现类似(分别为 82.00% ± 1.00% 和 81.00% ± 1.00%)。GPT-3.5-Turbo 的准确率最低(74.33% ± 3.06%)。经过第一阶段的训练,GPT-3.5-Turbo 的准确率提高到了 86.67% ± 1.89%。经过第二阶段的优化,该模型的准确率达到了 100%。结论这项研究不仅确定了各种 LLM 在 BLCA 相关查询中的比较性能,还验证了通过有针对性的训练优化来显著提高性能的潜力。GPT-3.5-Turbo 性能的成功提高表明,战略性的模型改进可以克服最初的局限性,并在专业医疗应用中实现最佳准确性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Journal of endourology
Journal of endourology 医学-泌尿学与肾脏学
CiteScore
5.50
自引率
14.80%
发文量
254
审稿时长
1 months
期刊介绍: Journal of Endourology, JE Case Reports, and Videourology are the leading peer-reviewed journal, case reports publication, and innovative videojournal companion covering all aspects of minimally invasive urology research, applications, and clinical outcomes. The leading journal of minimally invasive urology for over 30 years, Journal of Endourology is the essential publication for practicing surgeons who want to keep up with the latest surgical technologies in endoscopic, laparoscopic, robotic, and image-guided procedures as they apply to benign and malignant diseases of the genitourinary tract. This flagship journal includes the companion videojournal Videourology™ with every subscription. While Journal of Endourology remains focused on publishing rigorously peer reviewed articles, Videourology accepts original videos containing material that has not been reported elsewhere, except in the form of an abstract or a conference presentation. Journal of Endourology coverage includes: The latest laparoscopic, robotic, endoscopic, and image-guided techniques for treating both benign and malignant conditions Pioneering research articles Controversial cases in endourology Techniques in endourology with accompanying videos Reviews and epochs in endourology Endourology survey section of endourology relevant manuscripts published in other journals.
期刊最新文献
Medium-Term Outcomes after Primary Whole-Gland High-Intensity Focused Ultrasound Ablation for the Treatment of Prostate Cancer: A Single-Center Experience. Robot-Assisted Laparoscopic Pyeloplasty in Infants Under 3 Months: Single-Institution Study Findings, Safety Measures, and Success Strategies. Initial Urological Surgery Using a New Domestic Single-Port Surgical Robotic System. Ureteroscopy and Laser Lithotripsy for Large (≥2 cm) Upper Tract Urinary Stones in Pediatric Patients: Outcomes from a Pediatric Endourology Referral Center. Enhanced Artificial Intelligence in Bladder Cancer Management: A Comparative Analysis and Optimization Study of Multiple Large Language Models.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1