Insook Cho, Hyunchul Park, Byeong Sun Park, Dong-Geon Lee
{"title":"Enhancing Adverse Event Reporting With Clinical Language Models: Inpatient Falls.","authors":"Insook Cho, Hyunchul Park, Byeong Sun Park, Dong-Geon Lee","doi":"10.1111/jan.16812","DOIUrl":null,"url":null,"abstract":"<p><strong>Aims: </strong>To develop a method for computationally detecting fall events using clinical language models to complement existing self-reporting mechanisms.</p><p><strong>Design: </strong>Retrospective observational study.</p><p><strong>Methods: </strong>Text data were collected from the unstructured nursing notes of three hospitals' electronic health records and the Korean national patient safety reports, totalling 34,480 records covering the period from January 2015 to December 2019. Note-level labelling was conducted by two researchers with 95% agreement. Preprocessing data anonymisation and English translation were followed by semantic validation. Five language models based on pretrained Bidirectional Encoder Representations from Transformers (BERT) and Generative Pretrained Transformer (GPT)-4 with prompt programming were explored. Model performance was assessed using F measurements. Error analysis was conducted for the GPT-4 results.</p><p><strong>Results: </strong>Fine-tuned BERT models with the English data set outperformed GPT-4, with Bio+Clinical BERT achieving the highest F1 score of 0.98. Fine-tuned Korean BERT with the Korean data set also reached an F1 score of 0.98, while GPT-4 achieved a competitive F1 score of 0.94. GPT-4 with prompt programming showed much higher F1 scores than GPT-4 with a standardised prompt for the English data set (0.85 vs. 0.39) and the Korean data set (0.94 vs. 0.03). The error analysis identified that the common misclassification patterns included fall history and homonyms, causing false positives and implicit expressions and missing contextual information, causing false negatives.</p><p><strong>Conclusion: </strong>The clinical language model approach, if used alongside the existing self-reporting, promises to increase the chance of identifying the majority of factual falls without the need for additional chart reviews.</p><p><strong>Impact: </strong>Inpatient falls are often underreported, with up to 91% of incidents missed in self-reports. Using language models, we identified a significant portion of these unreported falls, improving the accuracy of adverse event tracking while reducing the self-reporting burden on nurses.</p><p><strong>Patient or public contribution: </strong>Not applicable.</p>","PeriodicalId":54897,"journal":{"name":"Journal of Advanced Nursing","volume":" ","pages":""},"PeriodicalIF":3.8000,"publicationDate":"2025-02-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Advanced Nursing","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1111/jan.16812","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"NURSING","Score":null,"Total":0}
引用次数: 0
Abstract
Aims: To develop a method for computationally detecting fall events using clinical language models to complement existing self-reporting mechanisms.
Design: Retrospective observational study.
Methods: Text data were collected from the unstructured nursing notes of three hospitals' electronic health records and the Korean national patient safety reports, totalling 34,480 records covering the period from January 2015 to December 2019. Note-level labelling was conducted by two researchers with 95% agreement. Preprocessing data anonymisation and English translation were followed by semantic validation. Five language models based on pretrained Bidirectional Encoder Representations from Transformers (BERT) and Generative Pretrained Transformer (GPT)-4 with prompt programming were explored. Model performance was assessed using F measurements. Error analysis was conducted for the GPT-4 results.
Results: Fine-tuned BERT models with the English data set outperformed GPT-4, with Bio+Clinical BERT achieving the highest F1 score of 0.98. Fine-tuned Korean BERT with the Korean data set also reached an F1 score of 0.98, while GPT-4 achieved a competitive F1 score of 0.94. GPT-4 with prompt programming showed much higher F1 scores than GPT-4 with a standardised prompt for the English data set (0.85 vs. 0.39) and the Korean data set (0.94 vs. 0.03). The error analysis identified that the common misclassification patterns included fall history and homonyms, causing false positives and implicit expressions and missing contextual information, causing false negatives.
Conclusion: The clinical language model approach, if used alongside the existing self-reporting, promises to increase the chance of identifying the majority of factual falls without the need for additional chart reviews.
Impact: Inpatient falls are often underreported, with up to 91% of incidents missed in self-reports. Using language models, we identified a significant portion of these unreported falls, improving the accuracy of adverse event tracking while reducing the self-reporting burden on nurses.
期刊介绍:
The Journal of Advanced Nursing (JAN) contributes to the advancement of evidence-based nursing, midwifery and healthcare by disseminating high quality research and scholarship of contemporary relevance and with potential to advance knowledge for practice, education, management or policy.
All JAN papers are required to have a sound scientific, evidential, theoretical or philosophical base and to be critical, questioning and scholarly in approach. As an international journal, JAN promotes diversity of research and scholarship in terms of culture, paradigm and healthcare context. For JAN’s worldwide readership, authors are expected to make clear the wider international relevance of their work and to demonstrate sensitivity to cultural considerations and differences.