Diagnostic Accuracy of ChatGPT for Patients' Triage; a Systematic Review and Meta-Analysis.

IF 2.9 Q1 EMERGENCY MEDICINE Archives of Academic Emergency Medicine Pub Date : 2024-07-30 eCollection Date: 2024-01-01 DOI:10.22037/aaem.v12i1.2384

Navid Kaboudi, Saeedeh Firouzbakht, Mohammad Shahir Eftekhar, Fatemeh Fayazbakhsh, Niloufar Joharivarnoosfaderani, Salar Ghaderi, Mohammadreza Dehdashti, Yasmin Mohtasham Kia, Maryam Afshari, Maryam Vasaghi-Gharamaleki, Leila Haghani, Zahra Moradzadeh, Fattaneh Khalaj, Zahra Mohammadi, Zahra Hasanabadi, Ramin Shahidi

{"title":"Diagnostic Accuracy of ChatGPT for Patients' Triage; a Systematic Review and Meta-Analysis.","authors":"Navid Kaboudi, Saeedeh Firouzbakht, Mohammad Shahir Eftekhar, Fatemeh Fayazbakhsh, Niloufar Joharivarnoosfaderani, Salar Ghaderi, Mohammadreza Dehdashti, Yasmin Mohtasham Kia, Maryam Afshari, Maryam Vasaghi-Gharamaleki, Leila Haghani, Zahra Moradzadeh, Fattaneh Khalaj, Zahra Mohammadi, Zahra Hasanabadi, Ramin Shahidi","doi":"10.22037/aaem.v12i1.2384","DOIUrl":null,"url":null,"abstract":"Introduction: Artificial intelligence (AI), particularly ChatGPT developed by OpenAI, has shown the potential to improve diagnostic accuracy and efficiency in emergency department (ED) triage. This study aims to evaluate the diagnostic performance and safety of ChatGPT in prioritizing patients based on urgency in ED settings.Methods: A systematic review and meta-analysis were conducted following PRISMA guidelines. Comprehensive literature searches were performed in Scopus, Web of Science, PubMed, and Embase. Studies evaluating ChatGPT's diagnostic performance in ED triage were included. Quality assessment was conducted using the QUADAS-2 tool. Pooled accuracy estimates were calculated using a random-effects model, and heterogeneity was assessed with the I² statistic.Results: Fourteen studies with a total of 1,412 patients or scenarios were included. ChatGPT 4.0 demonstrated a pooled accuracy of 0.86 (95% CI: 0.64-0.98) with substantial heterogeneity (I² = 93%). ChatGPT 3.5 showed a pooled accuracy of 0.63 (95% CI: 0.43-0.81) with significant heterogeneity (I² = 84%). Funnel plots indicated potential publication bias, particularly for ChatGPT 3.5. Quality assessments revealed varying levels of risk of bias and applicability concerns.Conclusion: ChatGPT, especially version 4.0, shows promise in improving ED triage accuracy. However, significant variability and potential biases highlight the need for further evaluation and enhancement.","PeriodicalId":8146,"journal":{"name":"Archives of Academic Emergency Medicine","volume":"12 1","pages":"e60"},"PeriodicalIF":2.9000,"publicationDate":"2024-07-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11407534/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Archives of Academic Emergency Medicine","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.22037/aaem.v12i1.2384","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/1/1 0:00:00","PubModel":"eCollection","JCR":"Q1","JCRName":"EMERGENCY MEDICINE","Score":null,"Total":0}

引用次数: 0

Abstract

Introduction: Artificial intelligence (AI), particularly ChatGPT developed by OpenAI, has shown the potential to improve diagnostic accuracy and efficiency in emergency department (ED) triage. This study aims to evaluate the diagnostic performance and safety of ChatGPT in prioritizing patients based on urgency in ED settings.

Methods: A systematic review and meta-analysis were conducted following PRISMA guidelines. Comprehensive literature searches were performed in Scopus, Web of Science, PubMed, and Embase. Studies evaluating ChatGPT's diagnostic performance in ED triage were included. Quality assessment was conducted using the QUADAS-2 tool. Pooled accuracy estimates were calculated using a random-effects model, and heterogeneity was assessed with the I² statistic.

Results: Fourteen studies with a total of 1,412 patients or scenarios were included. ChatGPT 4.0 demonstrated a pooled accuracy of 0.86 (95% CI: 0.64-0.98) with substantial heterogeneity (I² = 93%). ChatGPT 3.5 showed a pooled accuracy of 0.63 (95% CI: 0.43-0.81) with significant heterogeneity (I² = 84%). Funnel plots indicated potential publication bias, particularly for ChatGPT 3.5. Quality assessments revealed varying levels of risk of bias and applicability concerns.

Conclusion: ChatGPT, especially version 4.0, shows promise in improving ED triage accuracy. However, significant variability and potential biases highlight the need for further evaluation and enhancement.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

用于患者分诊的 ChatGPT 的诊断准确性；系统回顾和 Meta 分析。

引言人工智能（AI），尤其是 OpenAI 开发的 ChatGPT，已显示出提高急诊科（ED）分诊诊断准确性和效率的潜力。本研究旨在评估 ChatGPT 在急诊室根据紧急程度对患者进行优先排序时的诊断性能和安全性：方法：按照 PRISMA 指南进行了系统回顾和荟萃分析。在 Scopus、Web of Science、PubMed 和 Embase 中进行了全面的文献检索。纳入了评估 ChatGPT 在急诊室分诊中诊断性能的研究。采用 QUADAS-2 工具进行质量评估。使用随机效应模型计算汇总的准确性估计值，并使用 I² 统计量评估异质性：结果：共纳入 14 项研究，涉及 1412 名患者或场景。ChatGPT 4.0 的汇总准确率为 0.86（95% CI：0.64-0.98），异质性很大（I² = 93%）。ChatGPT 3.5 的汇总准确率为 0.63（95% CI：0.43-0.81），具有显著的异质性（I² = 84%）。漏斗图显示了潜在的发表偏倚，尤其是 ChatGPT 3.5。质量评估显示存在不同程度的偏倚风险和适用性问题：结论：ChatGPT（尤其是 4.0 版）有望提高急诊室分诊的准确性。结论：ChatGPT（尤其是 4.0 版）在提高急诊室分诊准确性方面前景广阔，但其显著的差异性和潜在的偏差凸显了进一步评估和改进的必要性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊