从文本到表格:从医学文档中进行结构化信息检索的本地隐私保护大语言模型

Isabella Catharina Wiest, Dyke Ferber, Jiefu Zhu, Marko Van Treeck, Sonja Katharina Meyer, Radhika Juglan, Zunamys I. Carrero, Daniel Paech, Jens Kleesiek, Matthias P. Ebert, Daniel Truhn, Jakob Nikolas Kather
{"title":"从文本到表格:从医学文档中进行结构化信息检索的本地隐私保护大语言模型","authors":"Isabella Catharina Wiest, Dyke Ferber, Jiefu Zhu, Marko Van Treeck, Sonja Katharina Meyer, Radhika Juglan, Zunamys I. Carrero, Daniel Paech, Jens Kleesiek, Matthias P. Ebert, Daniel Truhn, Jakob Nikolas Kather","doi":"10.1101/2023.12.07.23299648","DOIUrl":null,"url":null,"abstract":"Background and Aims\nMost clinical information is encoded as text, but extracting quantitative information from text is challenging. Large Language Models (LLMs) have emerged as powerful tools for natural language processing and can parse clinical text. However, many LLMs including ChatGPT reside in remote data centers, which disqualifies them from processing personal healthcare data. We present an open-source pipeline using the local LLM 'Llama 2' for extracting quantitative information from clinical text and evaluate its use to detect clinical features of decompensated liver cirrhosis.\nMethods\nWe tasked the LLM to identify five key clinical features of decompensated liver cirrhosis in a zero- and one-shot way without any model training. Our specific objective was to identify abdominal pain, shortness of breath, confusion, liver cirrhosis, and ascites from 500 patient medical histories from the MIMIC IV dataset. We compared LLMs with three different sizes and a variety of pre-specified prompt engineering approaches. Model predictions were compared against the ground truth provided by the consent of three blinded medical experts. Results\nOur open-source pipeline yielded in highly accurate extraction of quantitative features from medical free text. Clinical features which were explicitly mentioned in the source text, such as liver cirrhosis and ascites, were detected with a sensitivity of 100% and 95% and a specificity of 96% and 95%, respectively from the 70 billion parameter model. Other clinical features, which are often paraphrased in a variety of ways, such as the presence of confusion, were detected only with a sensitivity of 76% and a specificity of 94%. Abdominal pain was detected with a sensitivity of 84% and a specificity of 97%. Shortness of breath was detected with a sensitivity of 87% and a specificity of 96%. The larger version of Llama 2 with 70b parameters outperformed the smaller version with 7b parameters in all tasks. Prompt engineering improved zero-shot performance, particularly for smaller model sizes.\nConclusion\nOur study successfully demonstrates the capability of using locally deployed LLMs to extract clinical information from free text. The hardware requirements are so low that not only on-premise, but also point-of-care deployment of LLMs are possible.","PeriodicalId":501258,"journal":{"name":"medRxiv - Gastroenterology","volume":"93 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2023-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"From Text to Tables: A Local Privacy Preserving Large Language Model for Structured Information Retrieval from Medical Documents\",\"authors\":\"Isabella Catharina Wiest, Dyke Ferber, Jiefu Zhu, Marko Van Treeck, Sonja Katharina Meyer, Radhika Juglan, Zunamys I. Carrero, Daniel Paech, Jens Kleesiek, Matthias P. Ebert, Daniel Truhn, Jakob Nikolas Kather\",\"doi\":\"10.1101/2023.12.07.23299648\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Background and Aims\\nMost clinical information is encoded as text, but extracting quantitative information from text is challenging. Large Language Models (LLMs) have emerged as powerful tools for natural language processing and can parse clinical text. However, many LLMs including ChatGPT reside in remote data centers, which disqualifies them from processing personal healthcare data. We present an open-source pipeline using the local LLM 'Llama 2' for extracting quantitative information from clinical text and evaluate its use to detect clinical features of decompensated liver cirrhosis.\\nMethods\\nWe tasked the LLM to identify five key clinical features of decompensated liver cirrhosis in a zero- and one-shot way without any model training. Our specific objective was to identify abdominal pain, shortness of breath, confusion, liver cirrhosis, and ascites from 500 patient medical histories from the MIMIC IV dataset. We compared LLMs with three different sizes and a variety of pre-specified prompt engineering approaches. Model predictions were compared against the ground truth provided by the consent of three blinded medical experts. Results\\nOur open-source pipeline yielded in highly accurate extraction of quantitative features from medical free text. Clinical features which were explicitly mentioned in the source text, such as liver cirrhosis and ascites, were detected with a sensitivity of 100% and 95% and a specificity of 96% and 95%, respectively from the 70 billion parameter model. Other clinical features, which are often paraphrased in a variety of ways, such as the presence of confusion, were detected only with a sensitivity of 76% and a specificity of 94%. Abdominal pain was detected with a sensitivity of 84% and a specificity of 97%. Shortness of breath was detected with a sensitivity of 87% and a specificity of 96%. The larger version of Llama 2 with 70b parameters outperformed the smaller version with 7b parameters in all tasks. Prompt engineering improved zero-shot performance, particularly for smaller model sizes.\\nConclusion\\nOur study successfully demonstrates the capability of using locally deployed LLMs to extract clinical information from free text. The hardware requirements are so low that not only on-premise, but also point-of-care deployment of LLMs are possible.\",\"PeriodicalId\":501258,\"journal\":{\"name\":\"medRxiv - Gastroenterology\",\"volume\":\"93 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-12-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"medRxiv - Gastroenterology\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1101/2023.12.07.23299648\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"medRxiv - Gastroenterology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1101/2023.12.07.23299648","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

背景和目的大多数临床信息都是以文本形式编码的,但从文本中提取定量信息是一项挑战。大型语言模型(LLM)已成为自然语言处理的强大工具,可以解析临床文本。然而,包括 ChatGPT 在内的许多 LLM 都位于远程数据中心,这使它们无法处理个人医疗数据。我们利用本地 LLM "Llama 2 "提出了一个开源管道,用于从临床文本中提取定量信息,并评估了它在检测肝硬化失代偿期临床特征方面的应用。我们的具体目标是从 MIMIC IV 数据集中的 500 份患者病历中识别出腹痛、呼吸急促、意识模糊、肝硬化和腹水。我们比较了三种不同大小的 LLM 和各种预先指定的提示工程方法。模型预测与三位盲人医学专家同意提供的基本事实进行了比较。结果我们的开源管道从医学自由文本中高度准确地提取了定量特征。源文本中明确提到的临床特征,如肝硬化和腹水,在 700 亿参数模型中的检测灵敏度分别为 100%和 95%,特异度分别为 96%和 95%。而其他临床特征(通常以各种方式转述),如是否存在意识模糊,仅以 76% 的灵敏度和 94% 的特异度被检测出来。腹痛的检测灵敏度为 84%,特异性为 97%。检测到呼吸急促的灵敏度为 87%,特异度为 96%。在所有任务中,采用 70b 参数的较大版本 Llama 2 均优于采用 7b 参数的较小版本。我们的研究成功证明了使用本地部署的 LLM 从自由文本中提取临床信息的能力。对硬件的要求非常低,因此不仅可以在本地部署 LLM,还可以在医疗点部署 LLM。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
From Text to Tables: A Local Privacy Preserving Large Language Model for Structured Information Retrieval from Medical Documents
Background and Aims Most clinical information is encoded as text, but extracting quantitative information from text is challenging. Large Language Models (LLMs) have emerged as powerful tools for natural language processing and can parse clinical text. However, many LLMs including ChatGPT reside in remote data centers, which disqualifies them from processing personal healthcare data. We present an open-source pipeline using the local LLM 'Llama 2' for extracting quantitative information from clinical text and evaluate its use to detect clinical features of decompensated liver cirrhosis. Methods We tasked the LLM to identify five key clinical features of decompensated liver cirrhosis in a zero- and one-shot way without any model training. Our specific objective was to identify abdominal pain, shortness of breath, confusion, liver cirrhosis, and ascites from 500 patient medical histories from the MIMIC IV dataset. We compared LLMs with three different sizes and a variety of pre-specified prompt engineering approaches. Model predictions were compared against the ground truth provided by the consent of three blinded medical experts. Results Our open-source pipeline yielded in highly accurate extraction of quantitative features from medical free text. Clinical features which were explicitly mentioned in the source text, such as liver cirrhosis and ascites, were detected with a sensitivity of 100% and 95% and a specificity of 96% and 95%, respectively from the 70 billion parameter model. Other clinical features, which are often paraphrased in a variety of ways, such as the presence of confusion, were detected only with a sensitivity of 76% and a specificity of 94%. Abdominal pain was detected with a sensitivity of 84% and a specificity of 97%. Shortness of breath was detected with a sensitivity of 87% and a specificity of 96%. The larger version of Llama 2 with 70b parameters outperformed the smaller version with 7b parameters in all tasks. Prompt engineering improved zero-shot performance, particularly for smaller model sizes. Conclusion Our study successfully demonstrates the capability of using locally deployed LLMs to extract clinical information from free text. The hardware requirements are so low that not only on-premise, but also point-of-care deployment of LLMs are possible.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Gut microbiome shifts in adolescents after sleeve gastrectomy with increased oral-associated taxa and pro-inflammatory potential Development of a machine-learning model for therapeutic efficacy prediction of preoperative treatment for esophageal cancer using single nucleotide variants of autophagy-related genes Why Symptoms Linger in Quiescent Crohn's Disease: Investigating the Impact of Sulfidogenic Microbes and Sulfur Metabolic Pathways Evidence that extracellular HSPB1 contributes to inflammation in alcohol-associated hepatitis Large language models outperform traditional natural language processing methods in extracting patient-reported outcomes in IBD
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1