基于开放访问自然语言处理的聊天机器人应用程序(ChatGPT)的紧急分类预测性能:一项基于场景的初步横断面研究

IF 1.1 Q3 EMERGENCY MEDICINE Turkish Journal of Emergency Medicine Pub Date : 2023-07-01 DOI:10.4103/tjem.tjem_79_23

İbrahim Sarbay, Göksu Bozdereli Berikol, İbrahim Ulaş Özturan

{"title":"基于开放访问自然语言处理的聊天机器人应用程序(ChatGPT)的紧急分类预测性能:一项基于场景的初步横断面研究","authors":"İbrahim Sarbay, Göksu Bozdereli Berikol, İbrahim Ulaş Özturan","doi":"10.4103/tjem.tjem_79_23","DOIUrl":null,"url":null,"abstract":"Objectives: Artificial intelligence companies have been increasing their initiatives recently to improve the results of chatbots, which are software programs that can converse with a human in natural language. The role of chatbots in health care is deemed worthy of research. OpenAI's ChatGPT is a supervised and empowered machine learning-based chatbot. The aim of this study was to determine the performance of ChatGPT in emergency medicine (EM) triage prediction.Methods: This was a preliminary, cross-sectional study conducted with case scenarios generated by the researchers based on the emergency severity index (ESI) handbook v4 cases. Two independent EM specialists who were experts in the ESI triage scale determined the triage categories for each case. A third independent EM specialist was consulted as arbiter, if necessary. Consensus results for each case scenario were assumed as the reference triage category. Subsequently, each case scenario was queried with ChatGPT and the answer was recorded as the index triage category. Inconsistent classifications between the ChatGPT and reference category were defined as over-triage (false positive) or under-triage (false negative).Results: Fifty case scenarios were assessed in the study. Reliability analysis showed a fair agreement between EM specialists and ChatGPT (Cohen's Kappa: 0.341). Eleven cases (22%) were over triaged and 9 (18%) cases were under triaged by ChatGPT. In 9 cases (18%), ChatGPT reported two consecutive triage categories, one of which matched the expert consensus. It had an overall sensitivity of 57.1% (95% confidence interval [CI]: 34-78.2), specificity of 34.5% (95% CI: 17.9-54.3), positive predictive value (PPV) of 38.7% (95% CI: 21.8-57.8), negative predictive value (NPV) of 52.6 (95% CI: 28.9-75.6), and an F1 score of 0.461. In high acuity cases (ESI-1 and ESI-2), ChatGPT showed a sensitivity of 76.2% (95% CI: 52.8-91.8), specificity of 93.1% (95% CI: 77.2-99.2), PPV of 88.9% (95% CI: 65.3-98.6), NPV of 84.4 (95% CI: 67.2-94.7), and an F1 score of 0.821. The receiver operating characteristic curve showed an area under the curve of 0.846 (95% CI: 0.724-0.969, P < 0.001) for high acuity cases.Conclusion: The performance of ChatGPT was best when predicting high acuity cases (ESI-1 and ESI-2). It may be useful when determining the cases requiring critical care. When trained with more medical knowledge, ChatGPT may be more accurate for other triage category predictions.","PeriodicalId":46536,"journal":{"name":"Turkish Journal of Emergency Medicine","volume":"23 3","pages":"156-161"},"PeriodicalIF":1.1000,"publicationDate":"2023-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ftp.ncbi.nlm.nih.gov/pub/pmc/oa_pdf/c9/0b/TJEM-23-156.PMC10389099.pdf","citationCount":"3","resultStr":"{\"title\":\"Performance of emergency triage prediction of an open access natural language processing based chatbot application (ChatGPT): A preliminary, scenario-based cross-sectional study.\",\"authors\":\"İbrahim Sarbay, Göksu Bozdereli Berikol, İbrahim Ulaş Özturan\",\"doi\":\"10.4103/tjem.tjem_79_23\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Objectives: Artificial intelligence companies have been increasing their initiatives recently to improve the results of chatbots, which are software programs that can converse with a human in natural language. The role of chatbots in health care is deemed worthy of research. OpenAI's ChatGPT is a supervised and empowered machine learning-based chatbot. The aim of this study was to determine the performance of ChatGPT in emergency medicine (EM) triage prediction.Methods: This was a preliminary, cross-sectional study conducted with case scenarios generated by the researchers based on the emergency severity index (ESI) handbook v4 cases. Two independent EM specialists who were experts in the ESI triage scale determined the triage categories for each case. A third independent EM specialist was consulted as arbiter, if necessary. Consensus results for each case scenario were assumed as the reference triage category. Subsequently, each case scenario was queried with ChatGPT and the answer was recorded as the index triage category. Inconsistent classifications between the ChatGPT and reference category were defined as over-triage (false positive) or under-triage (false negative).Results: Fifty case scenarios were assessed in the study. Reliability analysis showed a fair agreement between EM specialists and ChatGPT (Cohen's Kappa: 0.341). Eleven cases (22%) were over triaged and 9 (18%) cases were under triaged by ChatGPT. In 9 cases (18%), ChatGPT reported two consecutive triage categories, one of which matched the expert consensus. It had an overall sensitivity of 57.1% (95% confidence interval [CI]: 34-78.2), specificity of 34.5% (95% CI: 17.9-54.3), positive predictive value (PPV) of 38.7% (95% CI: 21.8-57.8), negative predictive value (NPV) of 52.6 (95% CI: 28.9-75.6), and an F1 score of 0.461. In high acuity cases (ESI-1 and ESI-2), ChatGPT showed a sensitivity of 76.2% (95% CI: 52.8-91.8), specificity of 93.1% (95% CI: 77.2-99.2), PPV of 88.9% (95% CI: 65.3-98.6), NPV of 84.4 (95% CI: 67.2-94.7), and an F1 score of 0.821. The receiver operating characteristic curve showed an area under the curve of 0.846 (95% CI: 0.724-0.969, P < 0.001) for high acuity cases.Conclusion: The performance of ChatGPT was best when predicting high acuity cases (ESI-1 and ESI-2). It may be useful when determining the cases requiring critical care. When trained with more medical knowledge, ChatGPT may be more accurate for other triage category predictions.\",\"PeriodicalId\":46536,\"journal\":{\"name\":\"Turkish Journal of Emergency Medicine\",\"volume\":\"23 3\",\"pages\":\"156-161\"},\"PeriodicalIF\":1.1000,\"publicationDate\":\"2023-07-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://ftp.ncbi.nlm.nih.gov/pub/pmc/oa_pdf/c9/0b/TJEM-23-156.PMC10389099.pdf\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Turkish Journal of Emergency Medicine\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.4103/tjem.tjem_79_23\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"EMERGENCY MEDICINE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Turkish Journal of Emergency Medicine","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.4103/tjem.tjem_79_23","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"EMERGENCY MEDICINE","Score":null,"Total":0}

引用次数: 3

摘要

目标:人工智能公司最近一直在增加他们的计划，以改善聊天机器人的结果，聊天机器人是一种可以用自然语言与人类交谈的软件程序。聊天机器人在医疗保健中的作用被认为是值得研究的。OpenAI的ChatGPT是一个基于监督和授权的机器学习的聊天机器人。本研究的目的是确定ChatGPT在急诊医学(EM)分诊预测中的性能。方法:这是一项初步的横断面研究，研究人员根据急诊严重程度指数(ESI)手册v4病例生成病例情景。两位独立的急诊专家是ESI分诊量表的专家，他们确定了每个病例的分诊类别。如有必要，将咨询第三位独立的新兴市场专家作为仲裁者。假设每种情况的一致结果作为参考分类类别。随后，使用ChatGPT查询每个案例场景，并将答案记录为索引分类类别。ChatGPT和参考分类之间不一致的分类被定义为过度分类(假阳性)或分类不足(假阴性)。结果:本研究共评估了50例病例。信度分析显示，新兴市场专家和ChatGPT之间存在公平的一致性(科恩Kappa: 0.341)。ChatGPT分类过度11例(22%)，分类不足9例(18%)。在9例(18%)中，ChatGPT报告了两个连续的分类，其中一个符合专家共识。其总体敏感性为57.1%(95%可信区间[CI]: 34-78.2)，特异性为34.5% (95% CI: 17.9-54.3)，阳性预测值(PPV)为38.7% (95% CI: 21.8-57.8)，阴性预测值(NPV)为52.6 (95% CI: 28.9-75.6)， F1评分为0.461。在高敏病例(ESI-1和ESI-2)中，ChatGPT的敏感性为76.2% (95% CI: 52.8-91.8)，特异性为93.1% (95% CI: 77.2-99.2)， PPV为88.9% (95% CI: 65.3-98.6)， NPV为84.4 (95% CI: 67.2-94.7)， F1评分为0.821。高视力患者工作特征曲线下面积为0.846 (95% CI: 0.724 ~ 0.969, P < 0.001)。结论:ChatGPT在预测高视力病例(ESI-1和ESI-2)时效果最好。在确定需要重症监护的病例时，它可能是有用的。当接受更多医学知识的训练时，ChatGPT可能会更准确地预测其他分类分类。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

摘要图片

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Performance of emergency triage prediction of an open access natural language processing based chatbot application (ChatGPT): A preliminary, scenario-based cross-sectional study.

Objectives: Artificial intelligence companies have been increasing their initiatives recently to improve the results of chatbots, which are software programs that can converse with a human in natural language. The role of chatbots in health care is deemed worthy of research. OpenAI's ChatGPT is a supervised and empowered machine learning-based chatbot. The aim of this study was to determine the performance of ChatGPT in emergency medicine (EM) triage prediction.

Methods: This was a preliminary, cross-sectional study conducted with case scenarios generated by the researchers based on the emergency severity index (ESI) handbook v4 cases. Two independent EM specialists who were experts in the ESI triage scale determined the triage categories for each case. A third independent EM specialist was consulted as arbiter, if necessary. Consensus results for each case scenario were assumed as the reference triage category. Subsequently, each case scenario was queried with ChatGPT and the answer was recorded as the index triage category. Inconsistent classifications between the ChatGPT and reference category were defined as over-triage (false positive) or under-triage (false negative).

Results: Fifty case scenarios were assessed in the study. Reliability analysis showed a fair agreement between EM specialists and ChatGPT (Cohen's Kappa: 0.341). Eleven cases (22%) were over triaged and 9 (18%) cases were under triaged by ChatGPT. In 9 cases (18%), ChatGPT reported two consecutive triage categories, one of which matched the expert consensus. It had an overall sensitivity of 57.1% (95% confidence interval [CI]: 34-78.2), specificity of 34.5% (95% CI: 17.9-54.3), positive predictive value (PPV) of 38.7% (95% CI: 21.8-57.8), negative predictive value (NPV) of 52.6 (95% CI: 28.9-75.6), and an F1 score of 0.461. In high acuity cases (ESI-1 and ESI-2), ChatGPT showed a sensitivity of 76.2% (95% CI: 52.8-91.8), specificity of 93.1% (95% CI: 77.2-99.2), PPV of 88.9% (95% CI: 65.3-98.6), NPV of 84.4 (95% CI: 67.2-94.7), and an F1 score of 0.821. The receiver operating characteristic curve showed an area under the curve of 0.846 (95% CI: 0.724-0.969, P < 0.001) for high acuity cases.

Conclusion: The performance of ChatGPT was best when predicting high acuity cases (ESI-1 and ESI-2). It may be useful when determining the cases requiring critical care. When trained with more medical knowledge, ChatGPT may be more accurate for other triage category predictions.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Turkish Journal of Emergency Medicine EMERGENCY MEDICINE-

CiteScore

1.70

自引率

0.00%

发文量

审稿时长

22 weeks

期刊介绍： The Turkish Journal of Emergency Medicine (Turk J Emerg Med) is an International, peer-reviewed, open-access journal that publishes clinical and experimental trials, case reports, invited reviews, case images, letters to the Editor, and interesting research conducted in all fields of Emergency Medicine. The Journal is the official scientific publication of the Emergency Medicine Association of Turkey (EMAT) and is printed four times a year, in January, April, July and October. The language of the journal is English. The Journal is based on independent and unbiased double-blinded peer-reviewed principles. Only unpublished papers that are not under review for publication elsewhere can be submitted. The authors are responsible for the scientific content of the material to be published. The Turkish Journal of Emergency Medicine reserves the right to request any research materials on which the paper is based. The Editorial Board of the Turkish Journal of Emergency Medicine and the Publisher adheres to the principles of the International Council of Medical Journal Editors, the World Association of Medical Editors, the Council of Science Editors, the Committee on Publication Ethics, the US National Library of Medicine, the US Office of Research Integrity, the European Association of Science Editors, and the International Society of Managing and Technical Editors.