Leveraging Guideline-Based Clinical Decision Support Systems with Large Language Models: A Case Study with Breast Cancer.

IF 1.3 4区 医学 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS Methods of Information in Medicine Pub Date : 2025-01-29 DOI:10.1055/a-2528-4299
Solène Delourme, Akram Redjdal, Jacques Bouaud, Brigitte Seroussi
{"title":"Leveraging Guideline-Based Clinical Decision Support Systems with Large Language Models: A Case Study with Breast Cancer.","authors":"Solène Delourme, Akram Redjdal, Jacques Bouaud, Brigitte Seroussi","doi":"10.1055/a-2528-4299","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Multidisciplinary tumor boards (MTBs) have been established in most countries to allow experts collaboratively determine the best treatment decisions for cancer patients. However, MTBs often face challenges such as case overload, which can compromise MTB decision quality. Clinical decision support systems (CDSSs) have been introduced to assist clinicians in this process. Despite their potential, CDSSs are still underutilized in routine practice. The emergence of large language models (LLMs), such as ChatGPT, offers new opportunities to improve the efficiency and usability of traditional clinical decision support systems (CDSSs).</p><p><strong>Objectives: </strong>OncoDoc2 is a guideline-based CDSS developed using a documentary approach and applied to breast cancer management. This study aims to evaluate the potential of LLMs, used as question-answering (QA) systems, to improve the usability of OncoDoc2 across different prompt engineering techniques (PETs).</p><p><strong>Methods: </strong>Data extracted from breast cancer patient summaries (BCPSs), together with questions formulated by OncoDoc2, were used to create prompts for various LLMs, and several PETs were designed and tested. Using a sample of 200 randomized BCPSs, LLMs and PETs were initially compared on their responses to OncoDoc2 questions using classic metrics (accuracy, precision, recall, and F1 score). Best performing LLMs and PETs were further assessed by comparing the therapeutic recommendations generated by OncoDoc2, based on LLM inputs, to those provided by MTB clinicians using OncoDoc2. Finally, the best performing method was validated using a new sample of 30 randomized BCPSs.</p><p><strong>Results: </strong>The combination of Mistral and OpenChat models under the enhanced zero-shot PET showed the best performance as a question-answering system. This approach gets a precision of 60.16%, a recall of 54.18%, an F1 Score of 56.59%, and an accuracy of 75.57% on the validation set of 30 BCPSs. However, this approach yielded poor results as a CDSS, with only 16.67% of the recommendations generated by OncoDoc2 based on LLM inputs matching the gold standard.</p><p><strong>Conclusions: </strong>All the criteria in the OncoDoc2 decision tree are crucial for capturing the uniqueness of each patient. Any deviation from a criterion alters the recommendations generated. Despite a good accuracy rate of 75.57% was achieved, LLMs still face challenges in reliably understanding complex medical contexts and be effective as CDSSs.</p>","PeriodicalId":49822,"journal":{"name":"Methods of Information in Medicine","volume":" ","pages":""},"PeriodicalIF":1.3000,"publicationDate":"2025-01-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Methods of Information in Medicine","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1055/a-2528-4299","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0

Abstract

Background: Multidisciplinary tumor boards (MTBs) have been established in most countries to allow experts collaboratively determine the best treatment decisions for cancer patients. However, MTBs often face challenges such as case overload, which can compromise MTB decision quality. Clinical decision support systems (CDSSs) have been introduced to assist clinicians in this process. Despite their potential, CDSSs are still underutilized in routine practice. The emergence of large language models (LLMs), such as ChatGPT, offers new opportunities to improve the efficiency and usability of traditional clinical decision support systems (CDSSs).

Objectives: OncoDoc2 is a guideline-based CDSS developed using a documentary approach and applied to breast cancer management. This study aims to evaluate the potential of LLMs, used as question-answering (QA) systems, to improve the usability of OncoDoc2 across different prompt engineering techniques (PETs).

Methods: Data extracted from breast cancer patient summaries (BCPSs), together with questions formulated by OncoDoc2, were used to create prompts for various LLMs, and several PETs were designed and tested. Using a sample of 200 randomized BCPSs, LLMs and PETs were initially compared on their responses to OncoDoc2 questions using classic metrics (accuracy, precision, recall, and F1 score). Best performing LLMs and PETs were further assessed by comparing the therapeutic recommendations generated by OncoDoc2, based on LLM inputs, to those provided by MTB clinicians using OncoDoc2. Finally, the best performing method was validated using a new sample of 30 randomized BCPSs.

Results: The combination of Mistral and OpenChat models under the enhanced zero-shot PET showed the best performance as a question-answering system. This approach gets a precision of 60.16%, a recall of 54.18%, an F1 Score of 56.59%, and an accuracy of 75.57% on the validation set of 30 BCPSs. However, this approach yielded poor results as a CDSS, with only 16.67% of the recommendations generated by OncoDoc2 based on LLM inputs matching the gold standard.

Conclusions: All the criteria in the OncoDoc2 decision tree are crucial for capturing the uniqueness of each patient. Any deviation from a criterion alters the recommendations generated. Despite a good accuracy rate of 75.57% was achieved, LLMs still face challenges in reliably understanding complex medical contexts and be effective as CDSSs.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
背景:大多数国家都成立了多学科肿瘤委员会(MTBs),以便专家们共同为癌症患者做出最佳治疗决定。然而,多学科肿瘤委员会经常面临病例超载等挑战,这可能会影响多学科肿瘤委员会的决策质量。临床决策支持系统(CDSS)的出现就是为了在这一过程中为临床医生提供帮助。尽管 CDSS 具有潜力,但在常规实践中仍未得到充分利用。大型语言模型(LLM)(如 ChatGPT)的出现为提高传统临床决策支持系统(CDSS)的效率和可用性提供了新的机遇:目的:OncoDoc2 是一种基于指南的 CDSS,采用文档方法开发,适用于乳腺癌管理。本研究旨在评估作为问题解答(QA)系统使用的 LLMs 在不同提示工程技术(PET)下提高 OncoDoc2 可用性的潜力:方法:从乳腺癌患者摘要(BCPS)中提取的数据与 OncoDoc2 提出的问题一起用于创建各种 LLM 的提示,并设计和测试了几种 PET。利用 200 份随机 BCPS 样本,使用经典指标(准确度、精确度、召回率和 F1 分数)对 LLM 和 PET 对 OncoDoc2 问题的回答进行了初步比较。通过比较 OncoDoc2 根据 LLM 输入生成的治疗建议和 MTB 临床医生使用 OncoDoc2 提供的治疗建议,进一步评估了表现最佳的 LLM 和 PET。最后,使用新的 30 个随机 BCPS 样本验证了性能最佳的方法:结果:Mistral 和 OpenChat 模型在增强的零点 PET 下的组合作为问题解答系统表现最佳。在 30 个 BCPS 的验证集上,该方法的精确度为 60.16%,召回率为 54.18%,F1 分数为 56.59%,准确率为 75.57%。然而,作为 CDSS,这种方法的结果并不理想,OncoDoc2 基于 LLM 输入生成的建议中只有 16.67% 与黄金标准相匹配:OncoDoc2决策树中的所有标准对于捕捉每位患者的独特性都至关重要。与标准的任何偏差都会改变生成的建议。尽管实现了 75.57% 的良好准确率,但 LLM 在可靠地理解复杂的医疗环境并有效地用作 CDSS 方面仍面临挑战。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Methods of Information in Medicine
Methods of Information in Medicine 医学-计算机:信息系统
CiteScore
3.70
自引率
11.80%
发文量
33
审稿时长
6-12 weeks
期刊介绍: Good medicine and good healthcare demand good information. Since the journal''s founding in 1962, Methods of Information in Medicine has stressed the methodology and scientific fundamentals of organizing, representing and analyzing data, information and knowledge in biomedicine and health care. Covering publications in the fields of biomedical and health informatics, medical biometry, and epidemiology, the journal publishes original papers, reviews, reports, opinion papers, editorials, and letters to the editor. From time to time, the journal publishes articles on particular focus themes as part of a journal''s issue.
期刊最新文献
Deciphering Abbreviations in Malaysian Clinical Notes Using Machine Learning. The Significance of Information Quality for the Secondary Use of the Information in the National Health Care Quality Registers in Finland. Leveraging Guideline-Based Clinical Decision Support Systems with Large Language Models: A Case Study with Breast Cancer. Cross-lingual Natural Language Processing on Limited Annotated Case/Radiology Reports in English and Japanese: Insights from the Real-MedNLP Workshop. Artificial Intelligence-Based Prediction of Contrast Medium Doses for Computed Tomography Angiography Using Optimized Clinical Parameter Sets.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1