Leveraging Guideline-Based Clinical Decision Support Systems with Large Language Models: A Case Study with Breast Cancer.

IF 1.8 4区医学 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS Methods of Information in Medicine Pub Date : 2024-09-01 Epub Date: 2025-01-29 DOI:10.1055/a-2528-4299

Solène Delourme, Akram Redjdal, Jacques Bouaud, Brigitte Seroussi

{"title":"Leveraging Guideline-Based Clinical Decision Support Systems with Large Language Models: A Case Study with Breast Cancer.","authors":"Solène Delourme, Akram Redjdal, Jacques Bouaud, Brigitte Seroussi","doi":"10.1055/a-2528-4299","DOIUrl":null,"url":null,"abstract":"Background: Multidisciplinary tumor boards (MTBs) have been established in most countries to allow experts collaboratively determine the best treatment decisions for cancer patients. However, MTBs often face challenges such as case overload, which can compromise MTB decision quality. Clinical decision support systems (CDSSs) have been introduced to assist clinicians in this process. Despite their potential, CDSSs are still underutilized in routine practice. The emergence of large language models (LLMs), such as ChatGPT, offers new opportunities to improve the efficiency and usability of traditional CDSSs.Objectives: OncoDoc2 is a guideline-based CDSS developed using a documentary approach and applied to breast cancer management. This study aims to evaluate the potential of LLMs, used as question-answering (QA) systems, to improve the usability of OncoDoc2 across different prompt engineering techniques (PETs).Methods: Data extracted from breast cancer patient summaries (BCPSs), together with questions formulated by OncoDoc2, were used to create prompts for various LLMs, and several PETs were designed and tested. Using a sample of 200 randomized BCPSs, LLMs and PETs were initially compared with regard to their responses to OncoDoc2 questions using classic metrics (accuracy, precision, recall, and F1 score). Best performing LLMs and PETs were further assessed by comparing the therapeutic recommendations generated by OncoDoc2, based on LLM inputs, to those provided by MTB clinicians using OncoDoc2. Finally, the best performing method was validated using a new sample of 30 randomized BCPSs.Results: The combination of Mistral and OpenChat models under the enhanced Zero-Shot PET showed the best performance as a question-answering system. This approach gets a precision of 60.16%, a recall of 54.18%, an F1 score of 56.59%, and an accuracy of 75.57% on the validation set of 30 BCPSs. However, this approach yielded poor results as a CDSS, with only 16.67% of the recommendations generated by OncoDoc2 based on LLM inputs matching the gold standard.Conclusion: All the criteria in the OncoDoc2 decision tree are crucial for capturing the uniqueness of each patient. Any deviation from a criterion alters the recommendations generated. Despite achieving a good accuracy rate of 75.57%, LLMs still face challenges in reliably understanding complex medical contexts and be effective as CDSSs.","PeriodicalId":49822,"journal":{"name":"Methods of Information in Medicine","volume":" ","pages":"85-96"},"PeriodicalIF":1.8000,"publicationDate":"2024-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12133322/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Methods of Information in Medicine","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1055/a-2528-4299","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/29 0:00:00","PubModel":"Epub","JCR":"Q3","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

Background: Multidisciplinary tumor boards (MTBs) have been established in most countries to allow experts collaboratively determine the best treatment decisions for cancer patients. However, MTBs often face challenges such as case overload, which can compromise MTB decision quality. Clinical decision support systems (CDSSs) have been introduced to assist clinicians in this process. Despite their potential, CDSSs are still underutilized in routine practice. The emergence of large language models (LLMs), such as ChatGPT, offers new opportunities to improve the efficiency and usability of traditional CDSSs.

Objectives: OncoDoc2 is a guideline-based CDSS developed using a documentary approach and applied to breast cancer management. This study aims to evaluate the potential of LLMs, used as question-answering (QA) systems, to improve the usability of OncoDoc2 across different prompt engineering techniques (PETs).

Methods: Data extracted from breast cancer patient summaries (BCPSs), together with questions formulated by OncoDoc2, were used to create prompts for various LLMs, and several PETs were designed and tested. Using a sample of 200 randomized BCPSs, LLMs and PETs were initially compared with regard to their responses to OncoDoc2 questions using classic metrics (accuracy, precision, recall, and F1 score). Best performing LLMs and PETs were further assessed by comparing the therapeutic recommendations generated by OncoDoc2, based on LLM inputs, to those provided by MTB clinicians using OncoDoc2. Finally, the best performing method was validated using a new sample of 30 randomized BCPSs.

Results: The combination of Mistral and OpenChat models under the enhanced Zero-Shot PET showed the best performance as a question-answering system. This approach gets a precision of 60.16%, a recall of 54.18%, an F1 score of 56.59%, and an accuracy of 75.57% on the validation set of 30 BCPSs. However, this approach yielded poor results as a CDSS, with only 16.67% of the recommendations generated by OncoDoc2 based on LLM inputs matching the gold standard.

Conclusion: All the criteria in the OncoDoc2 decision tree are crucial for capturing the uniqueness of each patient. Any deviation from a criterion alters the recommendations generated. Despite achieving a good accuracy rate of 75.57%, LLMs still face challenges in reliably understanding complex medical contexts and be effective as CDSSs.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

利用基于指南的临床决策支持系统与大语言模型：乳腺癌的案例研究。

背景：大多数国家都成立了多学科肿瘤委员会（MTBs），以便专家们共同为癌症患者做出最佳治疗决定。然而，多学科肿瘤委员会经常面临病例超载等挑战，这可能会影响多学科肿瘤委员会的决策质量。临床决策支持系统（CDSS）的出现就是为了在这一过程中为临床医生提供帮助。尽管 CDSS 具有潜力，但在常规实践中仍未得到充分利用。大型语言模型（LLM）（如 ChatGPT）的出现为提高传统临床决策支持系统（CDSS）的效率和可用性提供了新的机遇：目的：OncoDoc2 是一种基于指南的 CDSS，采用文档方法开发，适用于乳腺癌管理。本研究旨在评估作为问题解答（QA）系统使用的 LLMs 在不同提示工程技术（PET）下提高 OncoDoc2 可用性的潜力：方法：从乳腺癌患者摘要（BCPS）中提取的数据与 OncoDoc2 提出的问题一起用于创建各种 LLM 的提示，并设计和测试了几种 PET。利用 200 份随机 BCPS 样本，使用经典指标（准确度、精确度、召回率和 F1 分数）对 LLM 和 PET 对 OncoDoc2 问题的回答进行了初步比较。通过比较 OncoDoc2 根据 LLM 输入生成的治疗建议和 MTB 临床医生使用 OncoDoc2 提供的治疗建议，进一步评估了表现最佳的 LLM 和 PET。最后，使用新的 30 个随机 BCPS 样本验证了性能最佳的方法：结果：Mistral 和 OpenChat 模型在增强的零点 PET 下的组合作为问题解答系统表现最佳。在 30 个 BCPS 的验证集上，该方法的精确度为 60.16%，召回率为 54.18%，F1 分数为 56.59%，准确率为 75.57%。然而，作为 CDSS，这种方法的结果并不理想，OncoDoc2 基于 LLM 输入生成的建议中只有 16.67% 与黄金标准相匹配：OncoDoc2决策树中的所有标准对于捕捉每位患者的独特性都至关重要。与标准的任何偏差都会改变生成的建议。尽管实现了 75.57% 的良好准确率，但 LLM 在可靠地理解复杂的医疗环境并有效地用作 CDSS 方面仍面临挑战。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Methods of Information in Medicine 医学-计算机：信息系统

CiteScore

3.70

自引率

11.80%

发文量

审稿时长

6-12 weeks

期刊介绍： Good medicine and good healthcare demand good information. Since the journal''s founding in 1962, Methods of Information in Medicine has stressed the methodology and scientific fundamentals of organizing, representing and analyzing data, information and knowledge in biomedicine and health care. Covering publications in the fields of biomedical and health informatics, medical biometry, and epidemiology, the journal publishes original papers, reviews, reports, opinion papers, editorials, and letters to the editor. From time to time, the journal publishes articles on particular focus themes as part of a journal''s issue.