Empowering large language models for automated clinical assessment with generation-augmented retrieval and hierarchical chain-of-thought

IF 6.2 2区 医学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Artificial Intelligence in Medicine Pub Date : 2025-04-01 Epub Date: 2025-02-12 DOI:10.1016/j.artmed.2025.103078
Zhanzhong Gu , Wenjing Jia , Massimo Piccardi , Ping Yu
{"title":"Empowering large language models for automated clinical assessment with generation-augmented retrieval and hierarchical chain-of-thought","authors":"Zhanzhong Gu ,&nbsp;Wenjing Jia ,&nbsp;Massimo Piccardi ,&nbsp;Ping Yu","doi":"10.1016/j.artmed.2025.103078","DOIUrl":null,"url":null,"abstract":"<div><h3>Background:</h3><div>Understanding and extracting valuable information from electronic health records (EHRs) is important for improving healthcare delivery and health outcomes. Large language models (LLMs) have demonstrated significant proficiency in natural language understanding and processing, offering promises for automating the typically labor-intensive and time-consuming analytical tasks with EHRs. Despite the active application of LLMs in the healthcare setting, many foundation models lack real-world healthcare relevance. Applying LLMs to EHRs is still in its early stage. To advance this field, in this study, we pioneer a generation-augmented prompting paradigm “GAPrompt” to empower generic LLMs for automated clinical assessment, in particular, quantitative stroke severity assessment, using data extracted from EHRs.</div></div><div><h3>Methods:</h3><div>The GAPrompt paradigm comprises five components: (i) prompt-driven selection of LLMs, (ii) generation-augmented construction of a knowledge base, (iii) summary-based generation-augmented retrieval (SGAR); (iv) inferencing with a hierarchical chain-of-thought (HCoT), and (v) ensembling of multiple generations.</div></div><div><h3>Results:</h3><div>GAPrompt addresses the limitations of generic LLMs in clinical applications in a progressive manner. It efficiently evaluates the applicability of LLMs in specific tasks through LLM selection prompting, enhances their understanding of task-specific knowledge from the constructed knowledge base, improves the accuracy of knowledge and demonstration retrieval via SGAR, elevates LLM inference precision through HCoT, enhances generation robustness, and reduces hallucinations of LLM via ensembling. Experiment results demonstrate the capability of our method to empower LLMs to automatically assess EHRs and generate quantitative clinical assessment results.</div></div><div><h3>Conclusion:</h3><div>Our study highlights the applicability of enhancing the capabilities of foundation LLMs in medical domain-specific tasks, <em>i.e.</em>, automated quantitative analysis of EHRs, addressing the challenges of labor-intensive and often manually conducted quantitative assessment of stroke in clinical practice and research. This approach offers a practical and accessible GAPrompt paradigm for researchers and industry practitioners seeking to leverage the power of LLMs in domain-specific applications. Its utility extends beyond the medical domain, applicable to a wide range of fields.</div></div>","PeriodicalId":55458,"journal":{"name":"Artificial Intelligence in Medicine","volume":"162 ","pages":"Article 103078"},"PeriodicalIF":6.2000,"publicationDate":"2025-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Artificial Intelligence in Medicine","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0933365725000132","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/2/12 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

Abstract

Background:

Understanding and extracting valuable information from electronic health records (EHRs) is important for improving healthcare delivery and health outcomes. Large language models (LLMs) have demonstrated significant proficiency in natural language understanding and processing, offering promises for automating the typically labor-intensive and time-consuming analytical tasks with EHRs. Despite the active application of LLMs in the healthcare setting, many foundation models lack real-world healthcare relevance. Applying LLMs to EHRs is still in its early stage. To advance this field, in this study, we pioneer a generation-augmented prompting paradigm “GAPrompt” to empower generic LLMs for automated clinical assessment, in particular, quantitative stroke severity assessment, using data extracted from EHRs.

Methods:

The GAPrompt paradigm comprises five components: (i) prompt-driven selection of LLMs, (ii) generation-augmented construction of a knowledge base, (iii) summary-based generation-augmented retrieval (SGAR); (iv) inferencing with a hierarchical chain-of-thought (HCoT), and (v) ensembling of multiple generations.

Results:

GAPrompt addresses the limitations of generic LLMs in clinical applications in a progressive manner. It efficiently evaluates the applicability of LLMs in specific tasks through LLM selection prompting, enhances their understanding of task-specific knowledge from the constructed knowledge base, improves the accuracy of knowledge and demonstration retrieval via SGAR, elevates LLM inference precision through HCoT, enhances generation robustness, and reduces hallucinations of LLM via ensembling. Experiment results demonstrate the capability of our method to empower LLMs to automatically assess EHRs and generate quantitative clinical assessment results.

Conclusion:

Our study highlights the applicability of enhancing the capabilities of foundation LLMs in medical domain-specific tasks, i.e., automated quantitative analysis of EHRs, addressing the challenges of labor-intensive and often manually conducted quantitative assessment of stroke in clinical practice and research. This approach offers a practical and accessible GAPrompt paradigm for researchers and industry practitioners seeking to leverage the power of LLMs in domain-specific applications. Its utility extends beyond the medical domain, applicable to a wide range of fields.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
增强大型语言模型的自动临床评估与生成增强检索和层次思维链
背景:理解并从电子健康记录(EHRs)中提取有价值的信息对于改善医疗保健服务和健康结果非常重要。大型语言模型(llm)已经在自然语言理解和处理方面表现出了显著的熟练程度,为利用电子病历自动化典型的劳动密集型和耗时的分析任务提供了希望。尽管法学硕士在医疗环境中的积极应用,许多基础模型缺乏现实世界的医疗相关性。法学硕士在电子病历中的应用仍处于早期阶段。为了推进这一领域,在本研究中,我们开创了一种世代增强提示范式“GAPrompt”,使通用法学硕士能够使用从电子病历中提取的数据进行自动临床评估,特别是定量中风严重程度评估。方法:GAPrompt范式由五个部分组成:(i)基于提示的法学硕士选择,(ii)基于生成增强的知识库构建,(iii)基于摘要的生成增强检索(SGAR);(iv)用层次思维链(HCoT)进行推理,以及(v)多代集成。结果:GAPrompt以渐进的方式解决了仿制llm在临床应用中的局限性。通过LLM选择提示有效评估LLM在特定任务中的适用性,从构建的知识库中增强LLM对特定任务知识的理解,通过SGAR提高知识和演示检索的准确性,通过HCoT提高LLM推理精度,增强生成鲁棒性,并通过集成减少LLM的幻觉。实验结果表明,我们的方法能够使法学硕士自动评估电子病历并生成定量的临床评估结果。结论:我们的研究强调了增强基础法学硕士在医疗领域特定任务中的能力的适用性,即电子病历的自动定量分析,解决了临床实践和研究中劳动密集型和经常手动进行的脑卒中定量评估的挑战。这种方法为研究人员和行业从业者寻求在特定领域应用中利用法学硕士的力量提供了一个实用且易于访问的GAPrompt范例。它的用途超出了医学领域,适用于广泛的领域。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Artificial Intelligence in Medicine
Artificial Intelligence in Medicine 工程技术-工程:生物医学
CiteScore
15.00
自引率
2.70%
发文量
143
审稿时长
6.3 months
期刊介绍: Artificial Intelligence in Medicine publishes original articles from a wide variety of interdisciplinary perspectives concerning the theory and practice of artificial intelligence (AI) in medicine, medically-oriented human biology, and health care. Artificial intelligence in medicine may be characterized as the scientific discipline pertaining to research studies, projects, and applications that aim at supporting decision-based medical tasks through knowledge- and/or data-intensive computer-based solutions that ultimately support and improve the performance of a human care provider.
期刊最新文献
Multimodal biomarker AI techniques for early neurocognitive disorder diagnosis: A systematic review Reinforcement learning for real-time adaptive radiotherapy Measuring the quality of AI-generated clinical notes: A systematic review and experimental benchmark of evaluation methods Application research of dynamic chaotic sequence generation mechanism in pre-hospital emergency data encryption Multi-domain based heterogeneous network for drug-target interaction prediction
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1