Accuracy of Commercial Large Language Model (ChatGPT) to Predict the Diagnosis for Prehospital Patients Suitable for Ambulance Transport Decisions: Diagnostic Accuracy Study.

IF 2 3区 医学 Q2 EMERGENCY MEDICINE Prehospital Emergency Care Pub Date : 2025-01-01 Epub Date: 2025-02-12 DOI:10.1080/10903127.2025.2460775
Eric D Miller, Jeffrey Michael Franc, Attila J Hertelendy, Fadi Issa, Alexander Hart, Christina A Woodward, Bradford Newbury, Kiera Newbury, Dana Mathew, Kimberly Whitten-Chung, Eric Bauer, Amalia Voskanyan, Gregory R Ciottone
{"title":"Accuracy of Commercial Large Language Model (ChatGPT) to Predict the Diagnosis for Prehospital Patients Suitable for Ambulance Transport Decisions: Diagnostic Accuracy Study.","authors":"Eric D Miller, Jeffrey Michael Franc, Attila J Hertelendy, Fadi Issa, Alexander Hart, Christina A Woodward, Bradford Newbury, Kiera Newbury, Dana Mathew, Kimberly Whitten-Chung, Eric Bauer, Amalia Voskanyan, Gregory R Ciottone","doi":"10.1080/10903127.2025.2460775","DOIUrl":null,"url":null,"abstract":"<p><strong>Objectives: </strong>While ambulance transport decisions guided by artificial intelligence (AI) could be useful, little is known of the accuracy of AI in making patient diagnoses based on the pre-hospital patient care report (PCR). The primary objective of this study was to assess the accuracy of ChatGPT (OpenAI, Inc., San Francisco, CA, USA) to predict a patient's diagnosis using the PCR by comparing to a reference standard assigned by experienced paramedics. The secondary objective was to classify cases where the AI diagnosis did not agree with the reference standard as paramedic correct, ChatGPT correct, or equally correct.</p><p><strong>Methods: </strong>This diagnostic accuracy study used a zero-shot learning model and greedy decoding. A convenience sample of PCRs from paramedic students was analyzed by an untrained ChatGPT-4 model to determine the single most likely diagnosis. A reference standard was provided by an experienced paramedic reviewing each PCR and giving a differential diagnosis of three items. A trained prehospital professional assessed the ChatGPT diagnosis as concordant or non-concordant with one of the three paramedic diagnoses. If non-concordant, two board-certified emergency physicians independently decided if the ChatGPT or the paramedic diagnosis was more likely to be correct.</p><p><strong>Results: </strong>ChatGPT-4 diagnosed 78/104 (75.0%) of PCRs correctly (95% confidence interval: 65.3-82.7%). Among the 26 cases of disagreement, judgment by the emergency physicians was that in 6/26 (23.0%) the paramedic diagnosis was more likely to be correct. There was only one case of the 104 (0.96%) where transport decisions based on the AI guided diagnosis would have been potentially dangerous to the patient (under-triage).</p><p><strong>Conclusions: </strong>In this study, overall accuracy of ChatGPT to diagnose patients based on their emergency medical services PCR was 75.0%. In cases where the ChatGPT diagnosis was considered less likely than paramedic diagnosis, most commonly the AI diagnosis was more critical than the paramedic diagnosis-potentially leading to over-triage. The under-triage rate was <1%.</p>","PeriodicalId":20336,"journal":{"name":"Prehospital Emergency Care","volume":" ","pages":"238-242"},"PeriodicalIF":2.0000,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Prehospital Emergency Care","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1080/10903127.2025.2460775","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/2/12 0:00:00","PubModel":"Epub","JCR":"Q2","JCRName":"EMERGENCY MEDICINE","Score":null,"Total":0}
引用次数: 0

Abstract

Objectives: While ambulance transport decisions guided by artificial intelligence (AI) could be useful, little is known of the accuracy of AI in making patient diagnoses based on the pre-hospital patient care report (PCR). The primary objective of this study was to assess the accuracy of ChatGPT (OpenAI, Inc., San Francisco, CA, USA) to predict a patient's diagnosis using the PCR by comparing to a reference standard assigned by experienced paramedics. The secondary objective was to classify cases where the AI diagnosis did not agree with the reference standard as paramedic correct, ChatGPT correct, or equally correct.

Methods: This diagnostic accuracy study used a zero-shot learning model and greedy decoding. A convenience sample of PCRs from paramedic students was analyzed by an untrained ChatGPT-4 model to determine the single most likely diagnosis. A reference standard was provided by an experienced paramedic reviewing each PCR and giving a differential diagnosis of three items. A trained prehospital professional assessed the ChatGPT diagnosis as concordant or non-concordant with one of the three paramedic diagnoses. If non-concordant, two board-certified emergency physicians independently decided if the ChatGPT or the paramedic diagnosis was more likely to be correct.

Results: ChatGPT-4 diagnosed 78/104 (75.0%) of PCRs correctly (95% confidence interval: 65.3-82.7%). Among the 26 cases of disagreement, judgment by the emergency physicians was that in 6/26 (23.0%) the paramedic diagnosis was more likely to be correct. There was only one case of the 104 (0.96%) where transport decisions based on the AI guided diagnosis would have been potentially dangerous to the patient (under-triage).

Conclusions: In this study, overall accuracy of ChatGPT to diagnose patients based on their emergency medical services PCR was 75.0%. In cases where the ChatGPT diagnosis was considered less likely than paramedic diagnosis, most commonly the AI diagnosis was more critical than the paramedic diagnosis-potentially leading to over-triage. The under-triage rate was <1%.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
商业大语言模型(ChatGPT)预测院前患者诊断的准确性,适合救护车运输决策:诊断准确性研究。
虽然由人工智能(AI)指导的救护车运输决策可能是有用的,但人工智能在根据院前患者护理报告(PCR)做出患者诊断方面的准确性知之甚少。本研究的主要目的是评估ChatGPT (OpenAI, Inc., San Francisco, CA, USA)使用PCR预测患者诊断的准确性,并与经验丰富的护理人员指定的参考标准进行比较。次要目标是将人工智能诊断与参考标准不一致的病例分类为护理人员正确、ChatGPT正确或同等正确。方法:采用零概率学习模型和贪婪解码进行诊断准确率研究。通过未经训练的ChatGPT-4模型分析来自护理专业学生的pcr样本,以确定最可能的单一诊断。由经验丰富的护理人员提供参考标准,审查每个PCR并对三个项目进行鉴别诊断。一位训练有素的院前专业人员评估了ChatGPT诊断与三个护理人员诊断中的一个一致或不一致。如果不一致,两名委员会认证的急诊医生独立决定ChatGPT或护理人员的诊断更可能是正确的。结果:ChatGPT-4正确诊断了78/104(75.0%)的pcr(95%置信区间:65.3% ~ 82.7%)。在26例不一致的病例中,急诊医师的判断是6/26(23.0%)护理人员的诊断更可能是正确的。104例中只有1例(0.96%)基于人工智能指导诊断的运输决策对患者有潜在危险(分类不足)。结论:在本研究中,ChatGPT基于急诊医疗服务PCR诊断患者的总体准确率为75.0%。在ChatGPT诊断被认为比护理人员诊断可能性更小的情况下,最常见的是人工智能诊断比护理人员诊断更重要——这可能导致过度分类。分诊不足率低于1%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Prehospital Emergency Care
Prehospital Emergency Care 医学-公共卫生、环境卫生与职业卫生
CiteScore
4.30
自引率
12.50%
发文量
137
审稿时长
1 months
期刊介绍: Prehospital Emergency Care publishes peer-reviewed information relevant to the practice, educational advancement, and investigation of prehospital emergency care, including the following types of articles: Special Contributions - Original Articles - Education and Practice - Preliminary Reports - Case Conferences - Position Papers - Collective Reviews - Editorials - Letters to the Editor - Media Reviews.
期刊最新文献
Advancing Prehospital Pediatric Readiness: Formation and Future Directions of the Prehospital Pediatric Readiness Project. Response to: Prehospital Management of Suspected Spinal Cord Injuries: Have Vacuum Mattresses Been Inappropriately Maligned? Fighting Fire with Ice: A Multisite Collaboration to Evaluate the Impact of Prehospital Cold Water Immersion on Heat Stroke Patients. "We Call Ourselves The Pride Truck": A Phenomenological Approach to Contextualizing LGBTQ+ EMS Professional Life. Development of a Mobile Application for EMS Treatment Protocols Based on a Human-Centered Design Approach.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1