The performance of ChatGPT versus neurosurgery residents in neurosurgical board examination-like questions: a systematic review and meta-analysis.

IF 2.5 3区 医学 Q2 CLINICAL NEUROLOGY Neurosurgical Review Pub Date : 2024-12-07 DOI:10.1007/s10143-024-03144-y
Edgar Dominic A Bongco, Sean Kendrich N Cua, Mary Angeline Luz U Hernandez, Juan Silvestre G Pascual, Kathleen Joy O Khu
{"title":"The performance of ChatGPT versus neurosurgery residents in neurosurgical board examination-like questions: a systematic review and meta-analysis.","authors":"Edgar Dominic A Bongco, Sean Kendrich N Cua, Mary Angeline Luz U Hernandez, Juan Silvestre G Pascual, Kathleen Joy O Khu","doi":"10.1007/s10143-024-03144-y","DOIUrl":null,"url":null,"abstract":"<p><strong>Objective: </strong>Large language models and ChatGPT have been used in different fields of medical education. This study aimed to review the literature on the performance of ChatGPT in neurosurgery board examination-like questions compared to neurosurgery residents.</p><p><strong>Methods: </strong>A literature search was performed following PRISMA guidelines, covering the time period of ChatGPT's inception (November 2022) until October 25, 2024. Two reviewers screened for eligible studies, selecting those that used ChatGPT to answer neurosurgery board examination-like questions and compared the results with neurosurgery residents' scores. Risk of bias was assessed using JBI critical appraisal tool. Overall effect sizes and 95% confidence intervals were determined using a fixed-effects model with alpha at 0.05.</p><p><strong>Results: </strong>After screening, six studies were selected for qualitative and quantitative analysis. Accuracy of ChatGPT ranged from 50.4 to 78.8%, compared to residents' accuracy of 58.3 to 73.7%. Risk of bias was low in 4 out of 6 studies reviewed; the rest had moderate risk. There was an overall trend favoring neurosurgery residents versus ChatGPT (p < 0.00001), with high heterogeneity (I<sup>2</sup> = 96). These findings were similar on sub-group analysis of studies that used the Self-assessment in Neurosurgery (SANS) examination questions. However, on sensitivity analysis, removal of the highest weighted study skewed the results toward better performance of ChatGPT.</p><p><strong>Conclusion: </strong>Our meta-analysis showed that neurosurgery residents performed better than ChatGPT in answering neurosurgery board examination-like questions, although reviewed studies had high heterogeneity. Further improvement is necessary before it can become a useful and reliable supplementary tool in the delivery of neurosurgical education.</p>","PeriodicalId":19184,"journal":{"name":"Neurosurgical Review","volume":"47 1","pages":"892"},"PeriodicalIF":2.5000,"publicationDate":"2024-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Neurosurgical Review","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1007/s10143-024-03144-y","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"CLINICAL NEUROLOGY","Score":null,"Total":0}
引用次数: 0

Abstract

Objective: Large language models and ChatGPT have been used in different fields of medical education. This study aimed to review the literature on the performance of ChatGPT in neurosurgery board examination-like questions compared to neurosurgery residents.

Methods: A literature search was performed following PRISMA guidelines, covering the time period of ChatGPT's inception (November 2022) until October 25, 2024. Two reviewers screened for eligible studies, selecting those that used ChatGPT to answer neurosurgery board examination-like questions and compared the results with neurosurgery residents' scores. Risk of bias was assessed using JBI critical appraisal tool. Overall effect sizes and 95% confidence intervals were determined using a fixed-effects model with alpha at 0.05.

Results: After screening, six studies were selected for qualitative and quantitative analysis. Accuracy of ChatGPT ranged from 50.4 to 78.8%, compared to residents' accuracy of 58.3 to 73.7%. Risk of bias was low in 4 out of 6 studies reviewed; the rest had moderate risk. There was an overall trend favoring neurosurgery residents versus ChatGPT (p < 0.00001), with high heterogeneity (I2 = 96). These findings were similar on sub-group analysis of studies that used the Self-assessment in Neurosurgery (SANS) examination questions. However, on sensitivity analysis, removal of the highest weighted study skewed the results toward better performance of ChatGPT.

Conclusion: Our meta-analysis showed that neurosurgery residents performed better than ChatGPT in answering neurosurgery board examination-like questions, although reviewed studies had high heterogeneity. Further improvement is necessary before it can become a useful and reliable supplementary tool in the delivery of neurosurgical education.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
求助全文
约1分钟内获得全文 去求助
来源期刊
Neurosurgical Review
Neurosurgical Review 医学-临床神经学
CiteScore
5.60
自引率
7.10%
发文量
191
审稿时长
6-12 weeks
期刊介绍: The goal of Neurosurgical Review is to provide a forum for comprehensive reviews on current issues in neurosurgery. Each issue contains up to three reviews, reflecting all important aspects of one topic (a disease or a surgical approach). Comments by a panel of experts within the same issue complete the topic. By providing comprehensive coverage of one topic per issue, Neurosurgical Review combines the topicality of professional journals with the indepth treatment of a monograph. Original papers of high quality are also welcome.
期刊最新文献
ADC histogram analysis of tumor-infiltrating CD8+ T cell levels in meningioma. Circumferential nerve wrapping with muscle autograft: a modified strategy of microvascular decompression for trigeminal neuralgia. Correlation of endoscopic third ventriculostomy with postoperative body temperature elevation: a single-center retrospective comparative study. Intracranial dural arteriovenous fistulas with deep venous drainage: a single-center retrospective cohort study. Microsurgical clipping remains a viable option for the treatment of coilable ruptured middle cerebral artery aneurysms in the endovascular era.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1