ChatGPT与神经外科住院医师在神经外科委员会考试类问题中的表现:系统回顾和荟萃分析。

IF 2.8 3区 医学 Q2 CLINICAL NEUROLOGY Neurosurgical Review Pub Date : 2024-12-07 DOI:10.1007/s10143-024-03144-y
Edgar Dominic A Bongco, Sean Kendrich N Cua, Mary Angeline Luz U Hernandez, Juan Silvestre G Pascual, Kathleen Joy O Khu
{"title":"ChatGPT与神经外科住院医师在神经外科委员会考试类问题中的表现:系统回顾和荟萃分析。","authors":"Edgar Dominic A Bongco, Sean Kendrich N Cua, Mary Angeline Luz U Hernandez, Juan Silvestre G Pascual, Kathleen Joy O Khu","doi":"10.1007/s10143-024-03144-y","DOIUrl":null,"url":null,"abstract":"<p><strong>Objective: </strong>Large language models and ChatGPT have been used in different fields of medical education. This study aimed to review the literature on the performance of ChatGPT in neurosurgery board examination-like questions compared to neurosurgery residents.</p><p><strong>Methods: </strong>A literature search was performed following PRISMA guidelines, covering the time period of ChatGPT's inception (November 2022) until October 25, 2024. Two reviewers screened for eligible studies, selecting those that used ChatGPT to answer neurosurgery board examination-like questions and compared the results with neurosurgery residents' scores. Risk of bias was assessed using JBI critical appraisal tool. Overall effect sizes and 95% confidence intervals were determined using a fixed-effects model with alpha at 0.05.</p><p><strong>Results: </strong>After screening, six studies were selected for qualitative and quantitative analysis. Accuracy of ChatGPT ranged from 50.4 to 78.8%, compared to residents' accuracy of 58.3 to 73.7%. Risk of bias was low in 4 out of 6 studies reviewed; the rest had moderate risk. There was an overall trend favoring neurosurgery residents versus ChatGPT (p < 0.00001), with high heterogeneity (I<sup>2</sup> = 96). These findings were similar on sub-group analysis of studies that used the Self-assessment in Neurosurgery (SANS) examination questions. However, on sensitivity analysis, removal of the highest weighted study skewed the results toward better performance of ChatGPT.</p><p><strong>Conclusion: </strong>Our meta-analysis showed that neurosurgery residents performed better than ChatGPT in answering neurosurgery board examination-like questions, although reviewed studies had high heterogeneity. Further improvement is necessary before it can become a useful and reliable supplementary tool in the delivery of neurosurgical education.</p>","PeriodicalId":19184,"journal":{"name":"Neurosurgical Review","volume":"47 1","pages":"892"},"PeriodicalIF":2.8000,"publicationDate":"2024-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"The performance of ChatGPT versus neurosurgery residents in neurosurgical board examination-like questions: a systematic review and meta-analysis.\",\"authors\":\"Edgar Dominic A Bongco, Sean Kendrich N Cua, Mary Angeline Luz U Hernandez, Juan Silvestre G Pascual, Kathleen Joy O Khu\",\"doi\":\"10.1007/s10143-024-03144-y\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Objective: </strong>Large language models and ChatGPT have been used in different fields of medical education. This study aimed to review the literature on the performance of ChatGPT in neurosurgery board examination-like questions compared to neurosurgery residents.</p><p><strong>Methods: </strong>A literature search was performed following PRISMA guidelines, covering the time period of ChatGPT's inception (November 2022) until October 25, 2024. Two reviewers screened for eligible studies, selecting those that used ChatGPT to answer neurosurgery board examination-like questions and compared the results with neurosurgery residents' scores. Risk of bias was assessed using JBI critical appraisal tool. Overall effect sizes and 95% confidence intervals were determined using a fixed-effects model with alpha at 0.05.</p><p><strong>Results: </strong>After screening, six studies were selected for qualitative and quantitative analysis. Accuracy of ChatGPT ranged from 50.4 to 78.8%, compared to residents' accuracy of 58.3 to 73.7%. Risk of bias was low in 4 out of 6 studies reviewed; the rest had moderate risk. There was an overall trend favoring neurosurgery residents versus ChatGPT (p < 0.00001), with high heterogeneity (I<sup>2</sup> = 96). These findings were similar on sub-group analysis of studies that used the Self-assessment in Neurosurgery (SANS) examination questions. However, on sensitivity analysis, removal of the highest weighted study skewed the results toward better performance of ChatGPT.</p><p><strong>Conclusion: </strong>Our meta-analysis showed that neurosurgery residents performed better than ChatGPT in answering neurosurgery board examination-like questions, although reviewed studies had high heterogeneity. Further improvement is necessary before it can become a useful and reliable supplementary tool in the delivery of neurosurgical education.</p>\",\"PeriodicalId\":19184,\"journal\":{\"name\":\"Neurosurgical Review\",\"volume\":\"47 1\",\"pages\":\"892\"},\"PeriodicalIF\":2.8000,\"publicationDate\":\"2024-12-07\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Neurosurgical Review\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1007/s10143-024-03144-y\",\"RegionNum\":3,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"CLINICAL NEUROLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Neurosurgical Review","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1007/s10143-024-03144-y","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"CLINICAL NEUROLOGY","Score":null,"Total":0}
引用次数: 0

摘要

目的:大型语言模型和ChatGPT在医学教育的不同领域得到应用。本研究旨在回顾与神经外科住院医师相比,ChatGPT在神经外科委员会考试类问题中的表现的文献。方法:根据PRISMA指南进行文献检索,涵盖ChatGPT成立(2022年11月)至2024年10月25日这段时间。两名审稿人筛选符合条件的研究,选择那些使用ChatGPT回答神经外科委员会考试样问题的研究,并将结果与神经外科住院医师的分数进行比较。使用JBI关键评估工具评估偏倚风险。总体效应大小和95%置信区间采用固定效应模型,alpha值为0.05。结果:经筛选,选择6项研究进行定性和定量分析。ChatGPT的准确率为50.4% ~ 78.8%,而居民的准确率为58.3% ~ 73.7%。6项研究中有4项偏倚风险较低;其余的人有中等风险。总体趋势倾向于神经外科住院医师与ChatGPT (p 2 = 96)。这些发现与使用神经外科自我评估(SANS)试题的研究的亚组分析相似。然而,在敏感性分析中,去除权重最高的研究使结果偏向于ChatGPT的更好性能。结论:我们的荟萃分析显示,神经外科住院医师在回答神经外科委员会考试类问题方面的表现优于ChatGPT,尽管所回顾的研究具有很高的异质性。进一步的改进是必要的,才能成为一个有用的和可靠的辅助工具,在神经外科教育的交付。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
The performance of ChatGPT versus neurosurgery residents in neurosurgical board examination-like questions: a systematic review and meta-analysis.

Objective: Large language models and ChatGPT have been used in different fields of medical education. This study aimed to review the literature on the performance of ChatGPT in neurosurgery board examination-like questions compared to neurosurgery residents.

Methods: A literature search was performed following PRISMA guidelines, covering the time period of ChatGPT's inception (November 2022) until October 25, 2024. Two reviewers screened for eligible studies, selecting those that used ChatGPT to answer neurosurgery board examination-like questions and compared the results with neurosurgery residents' scores. Risk of bias was assessed using JBI critical appraisal tool. Overall effect sizes and 95% confidence intervals were determined using a fixed-effects model with alpha at 0.05.

Results: After screening, six studies were selected for qualitative and quantitative analysis. Accuracy of ChatGPT ranged from 50.4 to 78.8%, compared to residents' accuracy of 58.3 to 73.7%. Risk of bias was low in 4 out of 6 studies reviewed; the rest had moderate risk. There was an overall trend favoring neurosurgery residents versus ChatGPT (p < 0.00001), with high heterogeneity (I2 = 96). These findings were similar on sub-group analysis of studies that used the Self-assessment in Neurosurgery (SANS) examination questions. However, on sensitivity analysis, removal of the highest weighted study skewed the results toward better performance of ChatGPT.

Conclusion: Our meta-analysis showed that neurosurgery residents performed better than ChatGPT in answering neurosurgery board examination-like questions, although reviewed studies had high heterogeneity. Further improvement is necessary before it can become a useful and reliable supplementary tool in the delivery of neurosurgical education.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Neurosurgical Review
Neurosurgical Review 医学-临床神经学
CiteScore
5.60
自引率
7.10%
发文量
191
审稿时长
6-12 weeks
期刊介绍: The goal of Neurosurgical Review is to provide a forum for comprehensive reviews on current issues in neurosurgery. Each issue contains up to three reviews, reflecting all important aspects of one topic (a disease or a surgical approach). Comments by a panel of experts within the same issue complete the topic. By providing comprehensive coverage of one topic per issue, Neurosurgical Review combines the topicality of professional journals with the indepth treatment of a monograph. Original papers of high quality are also welcome.
期刊最新文献
External validation of the VALE scoring system for hemorrhage risk in pediatric AVM patients. Nomogram prediction model for pain recurrence in patients with trigeminal neuralgia after microvascular decompression. Normalized air/brain volume ratio (Air-Brain Index) as a postoperative marker of delayed chronic subdural hematoma after clipping of unruptured cerebral aneurysms. Prognostic factors in vagus nerve stimulation for drug-resistant epilepsy. Results from a systematic review and meta-analysis of the literature. Traumatic brain injury among hispanic children in the United States: a comprehensive systematic review of the literature.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1