Accuracy of Commercial Large Language Model (ChatGPT) to Predict the Diagnosis for Prehospital Patients Suitable for Ambulance Transport Decisions: Diagnostic Accuracy Study.
Eric D Miller, Jeffrey Michael Franc, Attila J Hertelendy, Fadi Issa, Alexander Hart, Christina A Woodward, Bradford Newbury, Kiera Newbury, Dana Mathew, Kimberly Whitten-Chung, Eric Bauer, Amalia Voskanyan, Gregory R Ciottone
{"title":"Accuracy of Commercial Large Language Model (ChatGPT) to Predict the Diagnosis for Prehospital Patients Suitable for Ambulance Transport Decisions: Diagnostic Accuracy Study.","authors":"Eric D Miller, Jeffrey Michael Franc, Attila J Hertelendy, Fadi Issa, Alexander Hart, Christina A Woodward, Bradford Newbury, Kiera Newbury, Dana Mathew, Kimberly Whitten-Chung, Eric Bauer, Amalia Voskanyan, Gregory R Ciottone","doi":"10.1080/10903127.2025.2460775","DOIUrl":null,"url":null,"abstract":"<p><strong>Objectives: </strong>While ambulance transport decisions guided by artificial intelligence (AI) could be useful, little is known of the accuracy of AI in making patient diagnoses based on the pre-hospital patient care report (PCR). The primary objective of this study was to assess the accuracy of ChatGPT (OpenAI, Inc., San Francisco, CA, USA) to predict a patient's diagnosis using the PCR by comparing to a reference standard assigned by experienced paramedics. The secondary objective was to classify cases where the AI diagnosis did not agree with the reference standard as paramedic correct, ChatGPT correct, or equally correct.</p><p><strong>Methods: </strong>This diagnostic accuracy study used a zero-shot learning model and greedy decoding. A convenience sample of PCRs from paramedic students was analyzed by an untrained ChatGPT-4 model to determine the single most likely diagnosis. A reference standard was provided by an experienced paramedic reviewing each PCR and giving a differential diagnosis of three items. A trained prehospital professional assessed the ChatGPT diagnosis as concordant or non-concordant with one of the three paramedic diagnoses. If non-concordant, two board-certified emergency physicians independently decided if the ChatGPT or the paramedic diagnosis was more likely to be correct.</p><p><strong>Results: </strong>ChatGPT-4 diagnosed 78/104 (75.0%) of PCRs correctly (95% confidence interval: 65.3-82.7%). Among the 26 cases of disagreement, judgment by the emergency physicians was that in 6/26 (23.0%) the paramedic diagnosis was more likely to be correct. There was only one case of the 104 (0.96%) where transport decisions based on the AI guided diagnosis would have been potentially dangerous to the patient (under-triage).</p><p><strong>Conclusions: </strong>In this study, overall accuracy of ChatGPT to diagnose patients based on their emergency medical services PCR was 75.0%. In cases where the ChatGPT diagnosis was considered less likely than paramedic diagnosis, most commonly the AI diagnosis was more critical than the paramedic diagnosis-potentially leading to over-triage. The under-triage rate was <1%.</p>","PeriodicalId":20336,"journal":{"name":"Prehospital Emergency Care","volume":" ","pages":"1-5"},"PeriodicalIF":2.1000,"publicationDate":"2025-02-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Prehospital Emergency Care","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1080/10903127.2025.2460775","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"EMERGENCY MEDICINE","Score":null,"Total":0}
引用次数: 0
Abstract
Objectives: While ambulance transport decisions guided by artificial intelligence (AI) could be useful, little is known of the accuracy of AI in making patient diagnoses based on the pre-hospital patient care report (PCR). The primary objective of this study was to assess the accuracy of ChatGPT (OpenAI, Inc., San Francisco, CA, USA) to predict a patient's diagnosis using the PCR by comparing to a reference standard assigned by experienced paramedics. The secondary objective was to classify cases where the AI diagnosis did not agree with the reference standard as paramedic correct, ChatGPT correct, or equally correct.
Methods: This diagnostic accuracy study used a zero-shot learning model and greedy decoding. A convenience sample of PCRs from paramedic students was analyzed by an untrained ChatGPT-4 model to determine the single most likely diagnosis. A reference standard was provided by an experienced paramedic reviewing each PCR and giving a differential diagnosis of three items. A trained prehospital professional assessed the ChatGPT diagnosis as concordant or non-concordant with one of the three paramedic diagnoses. If non-concordant, two board-certified emergency physicians independently decided if the ChatGPT or the paramedic diagnosis was more likely to be correct.
Results: ChatGPT-4 diagnosed 78/104 (75.0%) of PCRs correctly (95% confidence interval: 65.3-82.7%). Among the 26 cases of disagreement, judgment by the emergency physicians was that in 6/26 (23.0%) the paramedic diagnosis was more likely to be correct. There was only one case of the 104 (0.96%) where transport decisions based on the AI guided diagnosis would have been potentially dangerous to the patient (under-triage).
Conclusions: In this study, overall accuracy of ChatGPT to diagnose patients based on their emergency medical services PCR was 75.0%. In cases where the ChatGPT diagnosis was considered less likely than paramedic diagnosis, most commonly the AI diagnosis was more critical than the paramedic diagnosis-potentially leading to over-triage. The under-triage rate was <1%.
期刊介绍:
Prehospital Emergency Care publishes peer-reviewed information relevant to the practice, educational advancement, and investigation of prehospital emergency care, including the following types of articles: Special Contributions - Original Articles - Education and Practice - Preliminary Reports - Case Conferences - Position Papers - Collective Reviews - Editorials - Letters to the Editor - Media Reviews.