专有大语言模型在标记产科事故报告中的准确性。

IF 2.3 Q2 HEALTH CARE SCIENCES & SERVICES Joint Commission journal on quality and patient safety Pub Date : 2024-08-06 DOI:10.1016/j.jcjq.2024.08.001
Jeanene Johnson MPH, BSN (is Quality Improvement Advisor, Quality Improvement Department, Stanford Medicine Children's Health, Palo Alto, California.), Conner Brown BS (is Data Scientist, Stanford Medicine Children's Health.), Grace Lee MD, MPH (is Professor, Department of Pediatrics, Stanford University School of Medicine, and Chief Quality Officer, Stanford Medicine Children's Health.), Keith Morse MD, MBA (is Clinical Associate Professor, Department of Pediatrics, Stanford University School of Medicine, and Medical Director of Clinical Informatics, Stanford Medicine Children's Health)
{"title":"专有大语言模型在标记产科事故报告中的准确性。","authors":"Jeanene Johnson MPH, BSN (is Quality Improvement Advisor, Quality Improvement Department, Stanford Medicine Children's Health, Palo Alto, California.),&nbsp;Conner Brown BS (is Data Scientist, Stanford Medicine Children's Health.),&nbsp;Grace Lee MD, MPH (is Professor, Department of Pediatrics, Stanford University School of Medicine, and Chief Quality Officer, Stanford Medicine Children's Health.),&nbsp;Keith Morse MD, MBA (is Clinical Associate Professor, Department of Pediatrics, Stanford University School of Medicine, and Medical Director of Clinical Informatics, Stanford Medicine Children's Health)","doi":"10.1016/j.jcjq.2024.08.001","DOIUrl":null,"url":null,"abstract":"<div><h3>Background</h3><div>Using the data collected through incident reporting systems is challenging, as it is a large volume of primarily qualitative information. Large language models (LLMs), such as ChatGPT, provide novel capabilities in text summarization and labeling that could support safety data trending and early identification of opportunities to prevent patient harm. This study assessed the capability of a proprietary LLM (GPT-3.5) to automatically label a cross-sectional sample of real-world obstetric incident reports.</div></div><div><h3>Methods</h3><div>A sample of 370 incident reports submitted to inpatient obstetric units between December 2022 and May 2023 was extracted. Human-annotated labels were assigned by a clinician reviewer and considered gold standard. The LLM was prompted to label incident reports relying solely on its pretrained knowledge and information included in the prompt. Primary outcomes assessed were sensitivity, specificity, positive predictive value, and negative predictive value. A secondary outcome assessed the human-perceived quality of the model's justification for the label(s) applied.</div></div><div><h3>Results</h3><div>The LLM demonstrated the ability to label incident reports with high sensitivity and specificity. The model applied a total of 79 labels compared to the reviewer's 49 labels. Overall sensitivity for the model was 85.7%, and specificity was 97.9%. Positive and negative predictive values were 53.2% and 99.6%, respectively. For 60.8% of labels, the reviewer approved of the model's justification for applying the label.</div></div><div><h3>Conclusion</h3><div>The proprietary LLM demonstrated the ability to label obstetric incident reports with high sensitivity and specificity. LLMs offer the potential to enable more efficient use of data from incident reporting systems.</div></div>","PeriodicalId":14835,"journal":{"name":"Joint Commission journal on quality and patient safety","volume":"50 12","pages":"Pages 877-881"},"PeriodicalIF":2.3000,"publicationDate":"2024-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Accuracy of a Proprietary Large Language Model in Labeling Obstetric Incident Reports\",\"authors\":\"Jeanene Johnson MPH, BSN (is Quality Improvement Advisor, Quality Improvement Department, Stanford Medicine Children's Health, Palo Alto, California.),&nbsp;Conner Brown BS (is Data Scientist, Stanford Medicine Children's Health.),&nbsp;Grace Lee MD, MPH (is Professor, Department of Pediatrics, Stanford University School of Medicine, and Chief Quality Officer, Stanford Medicine Children's Health.),&nbsp;Keith Morse MD, MBA (is Clinical Associate Professor, Department of Pediatrics, Stanford University School of Medicine, and Medical Director of Clinical Informatics, Stanford Medicine Children's Health)\",\"doi\":\"10.1016/j.jcjq.2024.08.001\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><h3>Background</h3><div>Using the data collected through incident reporting systems is challenging, as it is a large volume of primarily qualitative information. Large language models (LLMs), such as ChatGPT, provide novel capabilities in text summarization and labeling that could support safety data trending and early identification of opportunities to prevent patient harm. This study assessed the capability of a proprietary LLM (GPT-3.5) to automatically label a cross-sectional sample of real-world obstetric incident reports.</div></div><div><h3>Methods</h3><div>A sample of 370 incident reports submitted to inpatient obstetric units between December 2022 and May 2023 was extracted. Human-annotated labels were assigned by a clinician reviewer and considered gold standard. The LLM was prompted to label incident reports relying solely on its pretrained knowledge and information included in the prompt. Primary outcomes assessed were sensitivity, specificity, positive predictive value, and negative predictive value. A secondary outcome assessed the human-perceived quality of the model's justification for the label(s) applied.</div></div><div><h3>Results</h3><div>The LLM demonstrated the ability to label incident reports with high sensitivity and specificity. The model applied a total of 79 labels compared to the reviewer's 49 labels. Overall sensitivity for the model was 85.7%, and specificity was 97.9%. Positive and negative predictive values were 53.2% and 99.6%, respectively. For 60.8% of labels, the reviewer approved of the model's justification for applying the label.</div></div><div><h3>Conclusion</h3><div>The proprietary LLM demonstrated the ability to label obstetric incident reports with high sensitivity and specificity. LLMs offer the potential to enable more efficient use of data from incident reporting systems.</div></div>\",\"PeriodicalId\":14835,\"journal\":{\"name\":\"Joint Commission journal on quality and patient safety\",\"volume\":\"50 12\",\"pages\":\"Pages 877-881\"},\"PeriodicalIF\":2.3000,\"publicationDate\":\"2024-08-06\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Joint Commission journal on quality and patient safety\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1553725024002332\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"HEALTH CARE SCIENCES & SERVICES\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Joint Commission journal on quality and patient safety","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1553725024002332","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"HEALTH CARE SCIENCES & SERVICES","Score":null,"Total":0}
引用次数: 0

摘要

背景:使用事故报告系统收集的数据具有挑战性,因为这些数据主要是大量的定性信息。大型语言模型(LLM),如 ChatGPT,在文本总结和标注方面提供了新的功能,可以支持安全数据趋势和早期识别机会,以防止对患者造成伤害。本研究评估了专利语言模型(GPT-3.5)自动标注真实世界产科事件报告横截面样本的能力:提取了 2022 年 12 月至 2023 年 5 月间提交给产科住院部的 370 份事件报告样本。人工标注的标签由临床医生审核员指定,被视为黄金标准。LLM 仅根据其预先训练的知识和提示中包含的信息对事件报告进行标记。评估的主要结果包括灵敏度、特异性、阳性预测值和阴性预测值。次要结果是评估人类对模型贴标签理由的感知质量:结果:结果表明,LLM 能够以较高的灵敏度和特异性为事件报告贴标签。该模型共使用了 79 个标签,而审核员使用了 49 个标签。该模型的总体灵敏度为 85.7%,特异性为 97.9%。阳性和阴性预测值分别为 53.2% 和 99.6%。对于 60.8% 的标签,评审员认可模型应用标签的理由:专有的 LLM 展示了以高灵敏度和特异性对产科事故报告进行标记的能力。LLM 有助于更有效地利用事故报告系统中的数据。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Accuracy of a Proprietary Large Language Model in Labeling Obstetric Incident Reports

Background

Using the data collected through incident reporting systems is challenging, as it is a large volume of primarily qualitative information. Large language models (LLMs), such as ChatGPT, provide novel capabilities in text summarization and labeling that could support safety data trending and early identification of opportunities to prevent patient harm. This study assessed the capability of a proprietary LLM (GPT-3.5) to automatically label a cross-sectional sample of real-world obstetric incident reports.

Methods

A sample of 370 incident reports submitted to inpatient obstetric units between December 2022 and May 2023 was extracted. Human-annotated labels were assigned by a clinician reviewer and considered gold standard. The LLM was prompted to label incident reports relying solely on its pretrained knowledge and information included in the prompt. Primary outcomes assessed were sensitivity, specificity, positive predictive value, and negative predictive value. A secondary outcome assessed the human-perceived quality of the model's justification for the label(s) applied.

Results

The LLM demonstrated the ability to label incident reports with high sensitivity and specificity. The model applied a total of 79 labels compared to the reviewer's 49 labels. Overall sensitivity for the model was 85.7%, and specificity was 97.9%. Positive and negative predictive values were 53.2% and 99.6%, respectively. For 60.8% of labels, the reviewer approved of the model's justification for applying the label.

Conclusion

The proprietary LLM demonstrated the ability to label obstetric incident reports with high sensitivity and specificity. LLMs offer the potential to enable more efficient use of data from incident reporting systems.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
CiteScore
3.80
自引率
4.30%
发文量
116
审稿时长
49 days
期刊最新文献
Table of Contents Editorial Board The Joint Commission Journal on Quality and Patient Safety 50th Anniversary Article Collections: Patient Communication Protecting Parkinson's Patients: Hospital Care Standards to Avoid Preventable Harm Table of Contents
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1