电子健康记录中用于准确疾病检测的大型语言模型：晶体关节病的例子。

IF 5.1 2区医学 Q1 RHEUMATOLOGY RMD Open Pub Date : 2024-12-20 DOI:10.1136/rmdopen-2024-005003

Nils Bürgisser, Etienne Chalot, Samia Mehouachi, Clement P Buclin, Kim Lauper, Delphine S Courvoisier, Denis Mongin

{"title":"电子健康记录中用于准确疾病检测的大型语言模型：晶体关节病的例子。","authors":"Nils Bürgisser, Etienne Chalot, Samia Mehouachi, Clement P Buclin, Kim Lauper, Delphine S Courvoisier, Denis Mongin","doi":"10.1136/rmdopen-2024-005003","DOIUrl":null,"url":null,"abstract":"Objectives: We propose and test a framework to detect disease diagnosis using a recent large language model (LLM), Meta's Llama-3-8B, on French-language electronic health record (EHR) documents. Specifically, it focuses on detecting gout ('goutte' in French), a ubiquitous French term that has multiple meanings beyond the disease. The study compares the performance of the LLM-based framework with traditional natural language processing techniques and tests its dependence on the parameter used.Methods: The framework was developed using a training and testing set of 700 paragraphs assessing 'gout' from a random selection of EHR documents from a tertiary university hospital in Geneva, Switzerland. All paragraphs were manually reviewed and classified by two healthcare professionals into disease (true gout) and non-disease (gold standard). The LLM's accuracy was tested using few-shot and chain-of-thought prompting and compared with a regular expression (regex)-based method, focusing on the effects of model parameters and prompt structure. The framework was further validated on 600 paragraphs assessing 'Calcium Pyrophosphate Deposition Disease (CPPD)'.Results: The LLM-based algorithm outperformed the regex method, achieving a 92.7% (88.7%-95.4%) positive predictive value, a 96.6% (94.6%-97.8%) negative predictive value and an accuracy of 95.4% (93.6%-96.7%) for gout. In the validation set on CPPD, accuracy was 94.1% (90.2%-97.6%). The LLM framework performed well over a wide range of parameter values.Conclusion: LLMs accurately detected disease diagnoses from EHRs, even in non-English languages. They could facilitate creating large disease registers in any language, improving disease care assessment and patient recruitment for clinical trials.","PeriodicalId":21396,"journal":{"name":"RMD Open","volume":"10 4","pages":""},"PeriodicalIF":5.1000,"publicationDate":"2024-12-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11664341/pdf/","citationCount":"0","resultStr":"{\"title\":\"Large language models for accurate disease detection in electronic health records: the examples of crystal arthropathies.\",\"authors\":\"Nils Bürgisser, Etienne Chalot, Samia Mehouachi, Clement P Buclin, Kim Lauper, Delphine S Courvoisier, Denis Mongin\",\"doi\":\"10.1136/rmdopen-2024-005003\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Objectives: We propose and test a framework to detect disease diagnosis using a recent large language model (LLM), Meta's Llama-3-8B, on French-language electronic health record (EHR) documents. Specifically, it focuses on detecting gout ('goutte' in French), a ubiquitous French term that has multiple meanings beyond the disease. The study compares the performance of the LLM-based framework with traditional natural language processing techniques and tests its dependence on the parameter used.Methods: The framework was developed using a training and testing set of 700 paragraphs assessing 'gout' from a random selection of EHR documents from a tertiary university hospital in Geneva, Switzerland. All paragraphs were manually reviewed and classified by two healthcare professionals into disease (true gout) and non-disease (gold standard). The LLM's accuracy was tested using few-shot and chain-of-thought prompting and compared with a regular expression (regex)-based method, focusing on the effects of model parameters and prompt structure. The framework was further validated on 600 paragraphs assessing 'Calcium Pyrophosphate Deposition Disease (CPPD)'.Results: The LLM-based algorithm outperformed the regex method, achieving a 92.7% (88.7%-95.4%) positive predictive value, a 96.6% (94.6%-97.8%) negative predictive value and an accuracy of 95.4% (93.6%-96.7%) for gout. In the validation set on CPPD, accuracy was 94.1% (90.2%-97.6%). The LLM framework performed well over a wide range of parameter values.Conclusion: LLMs accurately detected disease diagnoses from EHRs, even in non-English languages. They could facilitate creating large disease registers in any language, improving disease care assessment and patient recruitment for clinical trials.\",\"PeriodicalId\":21396,\"journal\":{\"name\":\"RMD Open\",\"volume\":\"10 4\",\"pages\":\"\"},\"PeriodicalIF\":5.1000,\"publicationDate\":\"2024-12-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11664341/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"RMD Open\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1136/rmdopen-2024-005003\",\"RegionNum\":2,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"RHEUMATOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"RMD Open","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1136/rmdopen-2024-005003","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"RHEUMATOLOGY","Score":null,"Total":0}

引用次数: 0

摘要

目的：我们提出并测试了一个框架来检测疾病诊断，该框架使用最新的大型语言模型（LLM）， Meta的lama-3- 8b，用于法语电子健康记录（EHR）文件。具体来说，它专注于检测痛风（法语为“goutte”），这是一个无处不在的法语术语，除了疾病之外还有多种含义。该研究比较了基于llm的框架与传统自然语言处理技术的性能，并测试了其对所使用参数的依赖性。方法：该框架是使用一个700段的训练和测试集开发的，从瑞士日内瓦的一家三级大学医院随机选择的电子病历文件中评估“痛风”。所有段落都由两名医疗保健专业人员手动审查并分类为疾病（真正的痛风）和非疾病（金标准）。采用少弹提示和思维链提示对LLM的准确性进行了测试，并与基于正则表达式（regex）的方法进行了比较，重点研究了模型参数和提示结构的影响。该框架在评估“焦磷酸钙沉积病（CPPD）”的600个段落中得到进一步验证。结果：基于llm的算法优于regex方法，对痛风的阳性预测值为92.7%(88.7% ~ 95.4%)，阴性预测值为96.6%(94.6% ~ 97.8%)，准确率为95.4%（93.6% ~ 96.7%）。在CPPD验证集上，准确率为94.1%（90.2% ~ 97.6%）。LLM框架在广泛的参数值范围内表现良好。结论：LLMs可以准确地从电子病历中检测疾病诊断，即使是非英语语言。它们可以促进以任何语言创建大型疾病登记册，改善疾病护理评估和临床试验的患者招募。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Large language models for accurate disease detection in electronic health records: the examples of crystal arthropathies.

Objectives: We propose and test a framework to detect disease diagnosis using a recent large language model (LLM), Meta's Llama-3-8B, on French-language electronic health record (EHR) documents. Specifically, it focuses on detecting gout ('goutte' in French), a ubiquitous French term that has multiple meanings beyond the disease. The study compares the performance of the LLM-based framework with traditional natural language processing techniques and tests its dependence on the parameter used.

Methods: The framework was developed using a training and testing set of 700 paragraphs assessing 'gout' from a random selection of EHR documents from a tertiary university hospital in Geneva, Switzerland. All paragraphs were manually reviewed and classified by two healthcare professionals into disease (true gout) and non-disease (gold standard). The LLM's accuracy was tested using few-shot and chain-of-thought prompting and compared with a regular expression (regex)-based method, focusing on the effects of model parameters and prompt structure. The framework was further validated on 600 paragraphs assessing 'Calcium Pyrophosphate Deposition Disease (CPPD)'.

Results: The LLM-based algorithm outperformed the regex method, achieving a 92.7% (88.7%-95.4%) positive predictive value, a 96.6% (94.6%-97.8%) negative predictive value and an accuracy of 95.4% (93.6%-96.7%) for gout. In the validation set on CPPD, accuracy was 94.1% (90.2%-97.6%). The LLM framework performed well over a wide range of parameter values.

Conclusion: LLMs accurately detected disease diagnoses from EHRs, even in non-English languages. They could facilitate creating large disease registers in any language, improving disease care assessment and patient recruitment for clinical trials.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

RMD Open RHEUMATOLOGY-

CiteScore

7.30

自引率

6.50%

发文量

205

审稿时长

14 weeks

期刊介绍： RMD Open publishes high quality peer-reviewed original research covering the full spectrum of musculoskeletal disorders, rheumatism and connective tissue diseases, including osteoporosis, spine and rehabilitation. Clinical and epidemiological research, basic and translational medicine, interesting clinical cases, and smaller studies that add to the literature are all considered.