Simon A Amacher, Sira M Baumann, Sebastian Berger, Armon Arpagaus, Simon B Egli, Pascale Grzonka, Paulina S C Kliem, Sabina Hunziker, Urs Fisch, Caroline E Gebhard, Raoul Sutter
{"title":"大语言模型ChatGPT-4omni能否预测成人癫痫持续状态患者的预后?","authors":"Simon A Amacher, Sira M Baumann, Sebastian Berger, Armon Arpagaus, Simon B Egli, Pascale Grzonka, Paulina S C Kliem, Sabina Hunziker, Urs Fisch, Caroline E Gebhard, Raoul Sutter","doi":"10.1111/epi.18215","DOIUrl":null,"url":null,"abstract":"<p><strong>Objective: </strong>Large language models (LLMs) have recently gained attention for clinical decision-making and diagnosis. This study evaluates the performance of the recently updated LLM (ChatGPT-4o) in predicting clinical outcomes in patients with status epilepticus and compares its prognostic performance to the Status Epilepticus Severity Score (STESS).</p><p><strong>Methods: </strong>This retrospective single-center cohort study was performed at the University Hospital Basel (tertiary academic medical center) from January 2005 to December 2022. It included consecutive adult patients (≥18 years of age) with a diagnosis of status epilepticus. The primary outcome was survival at hospital discharge, and the secondary outcome was return to premorbid neurological function at hospital discharge. The performance characteristics of ChatGPT4-o (sensitivity, specificity, Youden Index) were evaluated and compared to those of the STESS.</p><p><strong>Results: </strong>Of 760 patients, 689 patients (90.7%) survived to discharge, and 317 survivors (41.7%) regained their premorbid neurological function at discharge. ChatGPT-4o predicted survival in 567 of 760 patients (74.6%), of which 45 died. ChatGPT-4o predicted death in 193 of 760 patients (25.4%), of which 167 survived, resulting in a sensitivity of 75.8% and a specificity of 36.6% (Youden Index 0.12, 95% confidence interval [CI] 0-.28) for predicting survival. ChatGPT-4o predicted return to premorbid neurologic function in 249 of 760 patients (32.8%), of which 112 did not return to their premorbid neurological function. ChatGPT-4o predicted no return to premorbid function in 511 of 760 patients (67.2%), of which 180 returned to their premorbid function, resulting in a sensitivity of 43.2% and a specificity of 74.7% (Youden Index .12, 95% CI .08-.28) for predicting return to premorbid neurological function. There was no difference in the prognostic performance of ChatGPT-4o and the STESS. A second round of prompting did not increase the predictive performance of ChatGPT-4o.</p><p><strong>Significance: </strong>ChatGPT-4o unreliably predicts outcomes in patients with status epilepticus. Clinicians should refrain from using ChatGPT-4o for prognostication in these patients.</p>","PeriodicalId":11768,"journal":{"name":"Epilepsia","volume":" ","pages":""},"PeriodicalIF":6.6000,"publicationDate":"2024-12-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Can the large language model ChatGPT-4omni predict outcomes in adult patients with status epilepticus?\",\"authors\":\"Simon A Amacher, Sira M Baumann, Sebastian Berger, Armon Arpagaus, Simon B Egli, Pascale Grzonka, Paulina S C Kliem, Sabina Hunziker, Urs Fisch, Caroline E Gebhard, Raoul Sutter\",\"doi\":\"10.1111/epi.18215\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Objective: </strong>Large language models (LLMs) have recently gained attention for clinical decision-making and diagnosis. This study evaluates the performance of the recently updated LLM (ChatGPT-4o) in predicting clinical outcomes in patients with status epilepticus and compares its prognostic performance to the Status Epilepticus Severity Score (STESS).</p><p><strong>Methods: </strong>This retrospective single-center cohort study was performed at the University Hospital Basel (tertiary academic medical center) from January 2005 to December 2022. It included consecutive adult patients (≥18 years of age) with a diagnosis of status epilepticus. The primary outcome was survival at hospital discharge, and the secondary outcome was return to premorbid neurological function at hospital discharge. The performance characteristics of ChatGPT4-o (sensitivity, specificity, Youden Index) were evaluated and compared to those of the STESS.</p><p><strong>Results: </strong>Of 760 patients, 689 patients (90.7%) survived to discharge, and 317 survivors (41.7%) regained their premorbid neurological function at discharge. ChatGPT-4o predicted survival in 567 of 760 patients (74.6%), of which 45 died. ChatGPT-4o predicted death in 193 of 760 patients (25.4%), of which 167 survived, resulting in a sensitivity of 75.8% and a specificity of 36.6% (Youden Index 0.12, 95% confidence interval [CI] 0-.28) for predicting survival. ChatGPT-4o predicted return to premorbid neurologic function in 249 of 760 patients (32.8%), of which 112 did not return to their premorbid neurological function. ChatGPT-4o predicted no return to premorbid function in 511 of 760 patients (67.2%), of which 180 returned to their premorbid function, resulting in a sensitivity of 43.2% and a specificity of 74.7% (Youden Index .12, 95% CI .08-.28) for predicting return to premorbid neurological function. There was no difference in the prognostic performance of ChatGPT-4o and the STESS. A second round of prompting did not increase the predictive performance of ChatGPT-4o.</p><p><strong>Significance: </strong>ChatGPT-4o unreliably predicts outcomes in patients with status epilepticus. Clinicians should refrain from using ChatGPT-4o for prognostication in these patients.</p>\",\"PeriodicalId\":11768,\"journal\":{\"name\":\"Epilepsia\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":6.6000,\"publicationDate\":\"2024-12-26\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Epilepsia\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1111/epi.18215\",\"RegionNum\":1,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"CLINICAL NEUROLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Epilepsia","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1111/epi.18215","RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"CLINICAL NEUROLOGY","Score":null,"Total":0}
Can the large language model ChatGPT-4omni predict outcomes in adult patients with status epilepticus?
Objective: Large language models (LLMs) have recently gained attention for clinical decision-making and diagnosis. This study evaluates the performance of the recently updated LLM (ChatGPT-4o) in predicting clinical outcomes in patients with status epilepticus and compares its prognostic performance to the Status Epilepticus Severity Score (STESS).
Methods: This retrospective single-center cohort study was performed at the University Hospital Basel (tertiary academic medical center) from January 2005 to December 2022. It included consecutive adult patients (≥18 years of age) with a diagnosis of status epilepticus. The primary outcome was survival at hospital discharge, and the secondary outcome was return to premorbid neurological function at hospital discharge. The performance characteristics of ChatGPT4-o (sensitivity, specificity, Youden Index) were evaluated and compared to those of the STESS.
Results: Of 760 patients, 689 patients (90.7%) survived to discharge, and 317 survivors (41.7%) regained their premorbid neurological function at discharge. ChatGPT-4o predicted survival in 567 of 760 patients (74.6%), of which 45 died. ChatGPT-4o predicted death in 193 of 760 patients (25.4%), of which 167 survived, resulting in a sensitivity of 75.8% and a specificity of 36.6% (Youden Index 0.12, 95% confidence interval [CI] 0-.28) for predicting survival. ChatGPT-4o predicted return to premorbid neurologic function in 249 of 760 patients (32.8%), of which 112 did not return to their premorbid neurological function. ChatGPT-4o predicted no return to premorbid function in 511 of 760 patients (67.2%), of which 180 returned to their premorbid function, resulting in a sensitivity of 43.2% and a specificity of 74.7% (Youden Index .12, 95% CI .08-.28) for predicting return to premorbid neurological function. There was no difference in the prognostic performance of ChatGPT-4o and the STESS. A second round of prompting did not increase the predictive performance of ChatGPT-4o.
Significance: ChatGPT-4o unreliably predicts outcomes in patients with status epilepticus. Clinicians should refrain from using ChatGPT-4o for prognostication in these patients.
期刊介绍:
Epilepsia is the leading, authoritative source for innovative clinical and basic science research for all aspects of epilepsy and seizures. In addition, Epilepsia publishes critical reviews, opinion pieces, and guidelines that foster understanding and aim to improve the diagnosis and treatment of people with seizures and epilepsy.