大语言模型ChatGPT-4omni能否预测成人癫痫持续状态患者的预后？

IF 5.6 1区医学 Q1 CLINICAL NEUROLOGY Epilepsia Pub Date : 2024-12-26 DOI:10.1111/epi.18215

Simon A. Amacher, Sira M. Baumann, Sebastian Berger, Armon Arpagaus, Simon B. Egli, Pascale Grzonka, Paulina S. C. Kliem, Sabina Hunziker, Urs Fisch, Caroline E. Gebhard, Raoul Sutter

{"title":"大语言模型ChatGPT-4omni能否预测成人癫痫持续状态患者的预后？","authors":"Simon A. Amacher, Sira M. Baumann, Sebastian Berger, Armon Arpagaus, Simon B. Egli, Pascale Grzonka, Paulina S. C. Kliem, Sabina Hunziker, Urs Fisch, Caroline E. Gebhard, Raoul Sutter","doi":"10.1111/epi.18215","DOIUrl":null,"url":null,"abstract":"<div>\n \n \n <section>\n \n <h3> Objective</h3>\n \n <p>Large language models (LLMs) have recently gained attention for clinical decision-making and diagnosis. This study evaluates the performance of the recently updated LLM Chat Generative Pre-Trained Transformer-4omni (ChatGPT-4o) in predicting clinical outcomes in patients with status epilepticus and compares its prognostic performance to the Status Epilepticus Severity Score (STESS).</p>\n </section>\n \n <section>\n \n <h3> Methods</h3>\n \n <p>This retrospective single-center cohort study was performed at the University Hospital Basel (tertiary academic medical center) from January 2005 to December 2022. It included consecutive adult patients (≥18 years of age) with a diagnosis of status epilepticus. The primary outcome was survival at hospital discharge, and the secondary outcome was return to premorbid neurological function at hospital discharge. The performance characteristics of ChatGPT4-o (sensitivity, specificity, Youden Index) were evaluated and compared to those of the STESS.</p>\n </section>\n \n <section>\n \n <h3> Results</h3>\n \n <p>Of 760 patients, 689 patients (90.7%) survived to discharge, and 317 survivors (41.7%) regained their premorbid neurological function at discharge. ChatGPT-4o predicted survival in 567 of 760 patients (74.6%), of which 45 died. ChatGPT-4o predicted death in 193 of 760 patients (25.4%), of which 167 survived, resulting in a sensitivity of 75.8% and a specificity of 36.6% (Youden Index 0.12, 95% confidence interval [CI] 0–.28) for predicting survival.</p>\n \n <p>ChatGPT-4o predicted return to premorbid neurologic function in 249 of 760 patients (32.8%), of which 112 did not return to their premorbid neurological function. ChatGPT-4o predicted no return to premorbid function in 511 of 760 patients (67.2%), of which 180 returned to their premorbid function, resulting in a sensitivity of 43.2% and a specificity of 74.7% (Youden Index .12, 95% CI .08–.28) for predicting return to premorbid neurological function.</p>\n \n <p>There was no difference in the prognostic performance of ChatGPT-4o and the STESS. A second round of prompting did not increase the predictive performance of ChatGPT-4o.</p>\n </section>\n \n <section>\n \n <h3> Significance</h3>\n \n <p>ChatGPT-4o unreliably predicts outcomes in patients with status epilepticus. Clinicians should refrain from using ChatGPT-4o for prognostication in these patients.</p>\n </section>\n </div>","PeriodicalId":11768,"journal":{"name":"Epilepsia","volume":"66 3","pages":"674-685"},"PeriodicalIF":5.6000,"publicationDate":"2024-12-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/epi.18215","citationCount":"0","resultStr":"{\"title\":\"Can the large language model ChatGPT-4omni predict outcomes in adult patients with status epilepticus?\",\"authors\":\"Simon A. Amacher, Sira M. Baumann, Sebastian Berger, Armon Arpagaus, Simon B. Egli, Pascale Grzonka, Paulina S. C. Kliem, Sabina Hunziker, Urs Fisch, Caroline E. Gebhard, Raoul Sutter\",\"doi\":\"10.1111/epi.18215\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div>\\n \\n \\n <section>\\n \\n <h3> Objective</h3>\\n \\n <p>Large language models (LLMs) have recently gained attention for clinical decision-making and diagnosis. This study evaluates the performance of the recently updated LLM Chat Generative Pre-Trained Transformer-4omni (ChatGPT-4o) in predicting clinical outcomes in patients with status epilepticus and compares its prognostic performance to the Status Epilepticus Severity Score (STESS).</p>\\n </section>\\n \\n <section>\\n \\n <h3> Methods</h3>\\n \\n <p>This retrospective single-center cohort study was performed at the University Hospital Basel (tertiary academic medical center) from January 2005 to December 2022. It included consecutive adult patients (≥18 years of age) with a diagnosis of status epilepticus. The primary outcome was survival at hospital discharge, and the secondary outcome was return to premorbid neurological function at hospital discharge. The performance characteristics of ChatGPT4-o (sensitivity, specificity, Youden Index) were evaluated and compared to those of the STESS.</p>\\n </section>\\n \\n <section>\\n \\n <h3> Results</h3>\\n \\n <p>Of 760 patients, 689 patients (90.7%) survived to discharge, and 317 survivors (41.7%) regained their premorbid neurological function at discharge. ChatGPT-4o predicted survival in 567 of 760 patients (74.6%), of which 45 died. ChatGPT-4o predicted death in 193 of 760 patients (25.4%), of which 167 survived, resulting in a sensitivity of 75.8% and a specificity of 36.6% (Youden Index 0.12, 95% confidence interval [CI] 0–.28) for predicting survival.</p>\\n \\n <p>ChatGPT-4o predicted return to premorbid neurologic function in 249 of 760 patients (32.8%), of which 112 did not return to their premorbid neurological function. ChatGPT-4o predicted no return to premorbid function in 511 of 760 patients (67.2%), of which 180 returned to their premorbid function, resulting in a sensitivity of 43.2% and a specificity of 74.7% (Youden Index .12, 95% CI .08–.28) for predicting return to premorbid neurological function.</p>\\n \\n <p>There was no difference in the prognostic performance of ChatGPT-4o and the STESS. A second round of prompting did not increase the predictive performance of ChatGPT-4o.</p>\\n </section>\\n \\n <section>\\n \\n <h3> Significance</h3>\\n \\n <p>ChatGPT-4o unreliably predicts outcomes in patients with status epilepticus. Clinicians should refrain from using ChatGPT-4o for prognostication in these patients.</p>\\n </section>\\n </div>\",\"PeriodicalId\":11768,\"journal\":{\"name\":\"Epilepsia\",\"volume\":\"66 3\",\"pages\":\"674-685\"},\"PeriodicalIF\":5.6000,\"publicationDate\":\"2024-12-26\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://onlinelibrary.wiley.com/doi/epdf/10.1111/epi.18215\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Epilepsia\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://onlinelibrary.wiley.com/doi/10.1111/epi.18215\",\"RegionNum\":1,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"CLINICAL NEUROLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Epilepsia","FirstCategoryId":"3","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1111/epi.18215","RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"CLINICAL NEUROLOGY","Score":null,"Total":0}

引用次数: 0

摘要

目的：大型语言模型（Large language models, LLMs）在临床决策和诊断中的应用日益受到关注。本研究评估了最近更新的LLM （chatgpt - 40）在预测癫痫持续状态患者临床结果方面的表现，并将其预后表现与癫痫持续状态严重程度评分（ess）进行了比较。方法：本回顾性单中心队列研究于2005年1月至2022年12月在巴塞尔大学医院（三级学术医疗中心）进行。它包括连续的成人患者（≥18岁），诊断为癫痫持续状态。主要终点是出院时的生存，次要终点是出院时病前神经功能的恢复。评价ChatGPT4-o的敏感性、特异性、约登指数（Youden Index），并与ess进行比较。结果：760例患者中，689例（90.7%）患者存活至出院，317例（41.7%）患者出院时恢复了病前神经功能。chatgpt - 40预测760例患者中有567例（74.6%）存活，其中45例死亡。chatgpt - 40预测760例患者中193例（25.4%）死亡，其中167例存活，预测生存的敏感性为75.8%，特异性为36.6%（约登指数0.12,95%可信区间[CI] 0- 0.28）。chatgpt - 40预测760例患者中有249例（32.8%）恢复到病前神经功能，其中112例没有恢复到病前神经功能。chatgpt - 40预测760例患者中有511例（67.2%）未恢复病前功能，其中180例恢复了病前功能，预测病前神经功能恢复的敏感性为43.2%，特异性为74.7%（约登指数为0.12,95% CI为0.08 - 0.28）。chatgpt - 40和ess的预后表现无差异。第二轮提示并没有提高chatgpt - 40的预测性能。意义：chatgpt - 40对癫痫持续状态患者预后的预测不可靠。临床医生应避免在这些患者中使用chatgpt - 40进行预测。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Can the large language model ChatGPT-4omni predict outcomes in adult patients with status epilepticus?

Objective

Large language models (LLMs) have recently gained attention for clinical decision-making and diagnosis. This study evaluates the performance of the recently updated LLM Chat Generative Pre-Trained Transformer-4omni (ChatGPT-4o) in predicting clinical outcomes in patients with status epilepticus and compares its prognostic performance to the Status Epilepticus Severity Score (STESS).

Methods

This retrospective single-center cohort study was performed at the University Hospital Basel (tertiary academic medical center) from January 2005 to December 2022. It included consecutive adult patients (≥18 years of age) with a diagnosis of status epilepticus. The primary outcome was survival at hospital discharge, and the secondary outcome was return to premorbid neurological function at hospital discharge. The performance characteristics of ChatGPT4-o (sensitivity, specificity, Youden Index) were evaluated and compared to those of the STESS.

Results

Of 760 patients, 689 patients (90.7%) survived to discharge, and 317 survivors (41.7%) regained their premorbid neurological function at discharge. ChatGPT-4o predicted survival in 567 of 760 patients (74.6%), of which 45 died. ChatGPT-4o predicted death in 193 of 760 patients (25.4%), of which 167 survived, resulting in a sensitivity of 75.8% and a specificity of 36.6% (Youden Index 0.12, 95% confidence interval [CI] 0–.28) for predicting survival.

ChatGPT-4o predicted return to premorbid neurologic function in 249 of 760 patients (32.8%), of which 112 did not return to their premorbid neurological function. ChatGPT-4o predicted no return to premorbid function in 511 of 760 patients (67.2%), of which 180 returned to their premorbid function, resulting in a sensitivity of 43.2% and a specificity of 74.7% (Youden Index .12, 95% CI .08–.28) for predicting return to premorbid neurological function.

There was no difference in the prognostic performance of ChatGPT-4o and the STESS. A second round of prompting did not increase the predictive performance of ChatGPT-4o.

Significance

ChatGPT-4o unreliably predicts outcomes in patients with status epilepticus. Clinicians should refrain from using ChatGPT-4o for prognostication in these patients.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Epilepsia 医学-临床神经学

CiteScore

10.90

自引率

10.70%

发文量

319

审稿时长

2-4 weeks

期刊介绍： Epilepsia is the leading, authoritative source for innovative clinical and basic science research for all aspects of epilepsy and seizures. In addition, Epilepsia publishes critical reviews, opinion pieces, and guidelines that foster understanding and aim to improve the diagnosis and treatment of people with seizures and epilepsy.