Nils Bürgisser, Etienne Chalot, Samia Mehouachi, Clement P Buclin, Kim Lauper, Delphine S Courvoisier, Denis Mongin
{"title":"电子健康记录中用于准确疾病检测的大型语言模型:晶体关节病的例子。","authors":"Nils Bürgisser, Etienne Chalot, Samia Mehouachi, Clement P Buclin, Kim Lauper, Delphine S Courvoisier, Denis Mongin","doi":"10.1136/rmdopen-2024-005003","DOIUrl":null,"url":null,"abstract":"<p><strong>Objectives: </strong>We propose and test a framework to detect disease diagnosis using a recent large language model (LLM), Meta's Llama-3-8B, on French-language electronic health record (EHR) documents. Specifically, it focuses on detecting gout ('goutte' in French), a ubiquitous French term that has multiple meanings beyond the disease. The study compares the performance of the LLM-based framework with traditional natural language processing techniques and tests its dependence on the parameter used.</p><p><strong>Methods: </strong>The framework was developed using a training and testing set of 700 paragraphs assessing 'gout' from a random selection of EHR documents from a tertiary university hospital in Geneva, Switzerland. All paragraphs were manually reviewed and classified by two healthcare professionals into disease (true gout) and non-disease (gold standard). The LLM's accuracy was tested using few-shot and chain-of-thought prompting and compared with a regular expression (regex)-based method, focusing on the effects of model parameters and prompt structure. The framework was further validated on 600 paragraphs assessing 'Calcium Pyrophosphate Deposition Disease (CPPD)'.</p><p><strong>Results: </strong>The LLM-based algorithm outperformed the regex method, achieving a 92.7% (88.7%-95.4%) positive predictive value, a 96.6% (94.6%-97.8%) negative predictive value and an accuracy of 95.4% (93.6%-96.7%) for gout. In the validation set on CPPD, accuracy was 94.1% (90.2%-97.6%). The LLM framework performed well over a wide range of parameter values.</p><p><strong>Conclusion: </strong>LLMs accurately detected disease diagnoses from EHRs, even in non-English languages. They could facilitate creating large disease registers in any language, improving disease care assessment and patient recruitment for clinical trials.</p>","PeriodicalId":21396,"journal":{"name":"RMD Open","volume":"10 4","pages":""},"PeriodicalIF":5.1000,"publicationDate":"2024-12-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11664341/pdf/","citationCount":"0","resultStr":"{\"title\":\"Large language models for accurate disease detection in electronic health records: the examples of crystal arthropathies.\",\"authors\":\"Nils Bürgisser, Etienne Chalot, Samia Mehouachi, Clement P Buclin, Kim Lauper, Delphine S Courvoisier, Denis Mongin\",\"doi\":\"10.1136/rmdopen-2024-005003\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Objectives: </strong>We propose and test a framework to detect disease diagnosis using a recent large language model (LLM), Meta's Llama-3-8B, on French-language electronic health record (EHR) documents. Specifically, it focuses on detecting gout ('goutte' in French), a ubiquitous French term that has multiple meanings beyond the disease. The study compares the performance of the LLM-based framework with traditional natural language processing techniques and tests its dependence on the parameter used.</p><p><strong>Methods: </strong>The framework was developed using a training and testing set of 700 paragraphs assessing 'gout' from a random selection of EHR documents from a tertiary university hospital in Geneva, Switzerland. All paragraphs were manually reviewed and classified by two healthcare professionals into disease (true gout) and non-disease (gold standard). The LLM's accuracy was tested using few-shot and chain-of-thought prompting and compared with a regular expression (regex)-based method, focusing on the effects of model parameters and prompt structure. The framework was further validated on 600 paragraphs assessing 'Calcium Pyrophosphate Deposition Disease (CPPD)'.</p><p><strong>Results: </strong>The LLM-based algorithm outperformed the regex method, achieving a 92.7% (88.7%-95.4%) positive predictive value, a 96.6% (94.6%-97.8%) negative predictive value and an accuracy of 95.4% (93.6%-96.7%) for gout. In the validation set on CPPD, accuracy was 94.1% (90.2%-97.6%). The LLM framework performed well over a wide range of parameter values.</p><p><strong>Conclusion: </strong>LLMs accurately detected disease diagnoses from EHRs, even in non-English languages. They could facilitate creating large disease registers in any language, improving disease care assessment and patient recruitment for clinical trials.</p>\",\"PeriodicalId\":21396,\"journal\":{\"name\":\"RMD Open\",\"volume\":\"10 4\",\"pages\":\"\"},\"PeriodicalIF\":5.1000,\"publicationDate\":\"2024-12-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11664341/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"RMD Open\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1136/rmdopen-2024-005003\",\"RegionNum\":2,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"RHEUMATOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"RMD Open","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1136/rmdopen-2024-005003","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"RHEUMATOLOGY","Score":null,"Total":0}
Large language models for accurate disease detection in electronic health records: the examples of crystal arthropathies.
Objectives: We propose and test a framework to detect disease diagnosis using a recent large language model (LLM), Meta's Llama-3-8B, on French-language electronic health record (EHR) documents. Specifically, it focuses on detecting gout ('goutte' in French), a ubiquitous French term that has multiple meanings beyond the disease. The study compares the performance of the LLM-based framework with traditional natural language processing techniques and tests its dependence on the parameter used.
Methods: The framework was developed using a training and testing set of 700 paragraphs assessing 'gout' from a random selection of EHR documents from a tertiary university hospital in Geneva, Switzerland. All paragraphs were manually reviewed and classified by two healthcare professionals into disease (true gout) and non-disease (gold standard). The LLM's accuracy was tested using few-shot and chain-of-thought prompting and compared with a regular expression (regex)-based method, focusing on the effects of model parameters and prompt structure. The framework was further validated on 600 paragraphs assessing 'Calcium Pyrophosphate Deposition Disease (CPPD)'.
Results: The LLM-based algorithm outperformed the regex method, achieving a 92.7% (88.7%-95.4%) positive predictive value, a 96.6% (94.6%-97.8%) negative predictive value and an accuracy of 95.4% (93.6%-96.7%) for gout. In the validation set on CPPD, accuracy was 94.1% (90.2%-97.6%). The LLM framework performed well over a wide range of parameter values.
Conclusion: LLMs accurately detected disease diagnoses from EHRs, even in non-English languages. They could facilitate creating large disease registers in any language, improving disease care assessment and patient recruitment for clinical trials.
期刊介绍:
RMD Open publishes high quality peer-reviewed original research covering the full spectrum of musculoskeletal disorders, rheumatism and connective tissue diseases, including osteoporosis, spine and rehabilitation. Clinical and epidemiological research, basic and translational medicine, interesting clinical cases, and smaller studies that add to the literature are all considered.