Proof-of-concept study of a small language model chatbot for breast cancer decision support - a transparent, source-controlled, explainable and data-secure approach.

IF 2.8 3区医学 Q3 ONCOLOGY Journal of Cancer Research and Clinical Oncology Pub Date : 2024-10-09 DOI:10.1007/s00432-024-05964-3

Sebastian Griewing, Fabian Lechner, Niklas Gremke, Stefan Lukac, Wolfgang Janni, Markus Wallwiener, Uwe Wagner, Martin Hirsch, Sebastian Kuhn

{"title":"Proof-of-concept study of a small language model chatbot for breast cancer decision support - a transparent, source-controlled, explainable and data-secure approach.","authors":"Sebastian Griewing, Fabian Lechner, Niklas Gremke, Stefan Lukac, Wolfgang Janni, Markus Wallwiener, Uwe Wagner, Martin Hirsch, Sebastian Kuhn","doi":"10.1007/s00432-024-05964-3","DOIUrl":null,"url":null,"abstract":"Purpose: Large language models (LLM) show potential for decision support in breast cancer care. Their use in clinical care is currently prohibited by lack of control over sources used for decision-making, explainability of the decision-making process and health data security issues. Recent development of Small Language Models (SLM) is discussed to address these challenges. This preclinical proof-of-concept study tailors an open-source SLM to the German breast cancer guideline (BC-SLM) to evaluate initial clinical accuracy and technical functionality in a preclinical simulation.Methods: A multidisciplinary tumor board (MTB) is used as the gold-standard to assess the initial clinical accuracy in terms of concordance of the BC-SLM with MTB and comparing it to two publicly available LLM, ChatGPT3.5 and 4. The study includes 20 fictional patient profiles and recommendations for 5 treatment modalities, resulting in 100 binary treatment recommendations (recommended or not recommended). Statistical evaluation includes concordance with MTB in % including Cohen's Kappa statistic (κ). Technical functionality is assessed qualitatively in terms of local hosting, adherence to the guideline and information retrieval.Results: The overall concordance amounts to 86% for BC-SLM (κ = 0.721, p < 0.001), 90% for ChatGPT4 (κ = 0.820, p < 0.001) and 83% for ChatGPT3.5 (κ = 0.661, p < 0.001). Specific concordance for each treatment modality ranges from 65 to 100% for BC-SLM, 85-100% for ChatGPT4, and 55-95% for ChatGPT3.5. The BC-SLM is locally functional, adheres to the standards of the German breast cancer guideline and provides referenced sections for its decision-making.Conclusion: The tailored BC-SLM shows initial clinical accuracy and technical functionality, with concordance to the MTB that is comparable to publicly-available LLMs like ChatGPT4 and 3.5. This serves as a proof-of-concept for adapting a SLM to an oncological disease and its guideline to address prevailing issues with LLM by ensuring decision transparency, explainability, source control, and data security, which represents a necessary step towards clinical validation and safe use of language models in clinical oncology.","PeriodicalId":15118,"journal":{"name":"Journal of Cancer Research and Clinical Oncology","volume":"150 10","pages":"451"},"PeriodicalIF":2.8000,"publicationDate":"2024-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11464535/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Cancer Research and Clinical Oncology","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1007/s00432-024-05964-3","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"ONCOLOGY","Score":null,"Total":0}

引用次数: 0

Abstract

Purpose: Large language models (LLM) show potential for decision support in breast cancer care. Their use in clinical care is currently prohibited by lack of control over sources used for decision-making, explainability of the decision-making process and health data security issues. Recent development of Small Language Models (SLM) is discussed to address these challenges. This preclinical proof-of-concept study tailors an open-source SLM to the German breast cancer guideline (BC-SLM) to evaluate initial clinical accuracy and technical functionality in a preclinical simulation.

Methods: A multidisciplinary tumor board (MTB) is used as the gold-standard to assess the initial clinical accuracy in terms of concordance of the BC-SLM with MTB and comparing it to two publicly available LLM, ChatGPT3.5 and 4. The study includes 20 fictional patient profiles and recommendations for 5 treatment modalities, resulting in 100 binary treatment recommendations (recommended or not recommended). Statistical evaluation includes concordance with MTB in % including Cohen's Kappa statistic (κ). Technical functionality is assessed qualitatively in terms of local hosting, adherence to the guideline and information retrieval.

Results: The overall concordance amounts to 86% for BC-SLM (κ = 0.721, p < 0.001), 90% for ChatGPT4 (κ = 0.820, p < 0.001) and 83% for ChatGPT3.5 (κ = 0.661, p < 0.001). Specific concordance for each treatment modality ranges from 65 to 100% for BC-SLM, 85-100% for ChatGPT4, and 55-95% for ChatGPT3.5. The BC-SLM is locally functional, adheres to the standards of the German breast cancer guideline and provides referenced sections for its decision-making.

Conclusion: The tailored BC-SLM shows initial clinical accuracy and technical functionality, with concordance to the MTB that is comparable to publicly-available LLMs like ChatGPT4 and 3.5. This serves as a proof-of-concept for adapting a SLM to an oncological disease and its guideline to address prevailing issues with LLM by ensuring decision transparency, explainability, source control, and data security, which represents a necessary step towards clinical validation and safe use of language models in clinical oncology.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

用于乳腺癌决策支持的小语言模型聊天机器人概念验证研究--一种透明、源控制、可解释和数据安全的方法。

目的：大型语言模型（LLM）在乳腺癌治疗决策支持方面显示出潜力。目前，由于缺乏对决策来源的控制、决策过程的可解释性以及健康数据的安全性问题，这些模型在临床护理中的使用受到了限制。本文讨论了小语言模型（SLM）的最新发展，以应对这些挑战。这项临床前概念验证研究根据德国乳腺癌指南（BC-SLM）定制了一个开源的小语言模型，以评估临床前模拟的初步临床准确性和技术功能：将多学科肿瘤委员会（MTB）作为黄金标准，从 BC-SLM 与 MTB 的一致性方面评估初始临床准确性，并将其与两款公开可用的 LLM（ChatGPT3.5 和 4）进行比较。研究包括 20 份虚构的患者资料和 5 种治疗方式的建议，最终得出 100 项二元治疗建议（建议或不建议）。统计评估包括与 MTB 的一致性（%），包括 Cohen's Kappa 统计量 (κ)。对技术功能进行了定性评估，包括本地托管、遵守指南和信息检索：结果：BC-SLM 的总体一致性达到 86%（κ = 0.721，p 结论：BC-SLM 的总体一致性达到 86%（κ = 0.721，p 结论）：量身定制的 BC-SLM 显示了初步的临床准确性和技术功能，与 MTB 的一致性可与 ChatGPT4 和 3.5 等公开发布的 LLM 相媲美。这是将 SLM 适应于肿瘤疾病的概念验证，也是通过确保决策透明度、可解释性、源控制和数据安全性来解决 LLM 普遍存在的问题的指南，是实现临床验证和在临床肿瘤学中安全使用语言模型的必要步骤。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Journal of Cancer Research and Clinical Oncology 医学-肿瘤学

CiteScore

4.00

自引率

2.80%

发文量

577

审稿时长

2 months

期刊介绍： The "Journal of Cancer Research and Clinical Oncology" publishes significant and up-to-date articles within the fields of experimental and clinical oncology. The journal, which is chiefly devoted to Original papers, also includes Reviews as well as Editorials and Guest editorials on current, controversial topics. The section Letters to the editors provides a forum for a rapid exchange of comments and information concerning previously published papers and topics of current interest. Meeting reports provide current information on the latest results presented at important congresses. The following fields are covered: carcinogenesis - etiology, mechanisms; molecular biology; recent developments in tumor therapy; general diagnosis; laboratory diagnosis; diagnostic and experimental pathology; oncologic surgery; and epidemiology.