克劳德、ChatGPT、副驾驶和双子星表现对神经科学不同主题学生的影响。

IF 1.7 4区 教育学 Q2 EDUCATION, SCIENTIFIC DISCIPLINES Advances in Physiology Education Pub Date : 2025-01-17 DOI:10.1152/advan.00093.2024
Volodymyr Mavrych, Ahmed Yaqinuddin, Olena Bolgova
{"title":"克劳德、ChatGPT、副驾驶和双子星表现对神经科学不同主题学生的影响。","authors":"Volodymyr Mavrych, Ahmed Yaqinuddin, Olena Bolgova","doi":"10.1152/advan.00093.2024","DOIUrl":null,"url":null,"abstract":"<p><p>Despite extensive studies on large language models and their capability to respond to questions from various licensed exams, there has been limited focus on employing chatbots for specific subjects within the medical curriculum, specifically medical neuroscience. This research compared the performances of Claude 3.5 Sonnet (Anthropic), GPT-3.5, GPT-4-1106 (OpenAI), Copilot free version (Microsoft), and Gemini 1.5 Flash (Google) versus students on MCQs from the medical neuroscience course database to evaluate chatbots reliability. 5 successive attempts of each chatbot to answer 200 USMLE-style questions were evaluated based on accuracy, relevance, and comprehensiveness. MCQs were categorized into 12 categories/topics. The results indicated that at the current level of development, selected AI-driven chatbots, on average, can accurately answer 67.2% of MCQs from the medical neuroscience course, which is 7.4% below the students' average. However, Claude and GPT-4 outperformed other chatbots with 83% and 81.7% correct answers, which is better than the average student result. They followed by Copilot - 59.5%, GPT-3.5 - 58.3%, and Gemini - 53.6%. Concerning different categories, Neurocytology, Embryology, and Diencephalon were the three best topics, with average results of 78.1% - 86.7%, and the lowest results were Brainstem, Special senses, and Cerebellum, with 54.4% - 57.7% correct answers. Our study suggested that Claude and GPT-4 are currently two of the most evolved chatbots. They exhibit proficiency in answering MCQs related to neuroscience that surpasses that of the average medical student. This breakthrough indicates a significant milestone in how AI can supplement and enhance educational tools and techniques.</p>","PeriodicalId":50852,"journal":{"name":"Advances in Physiology Education","volume":" ","pages":""},"PeriodicalIF":1.7000,"publicationDate":"2025-01-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Claude, ChatGPT, Copilot, and Gemini Performance versus Students in Different Topics of Neuroscience.\",\"authors\":\"Volodymyr Mavrych, Ahmed Yaqinuddin, Olena Bolgova\",\"doi\":\"10.1152/advan.00093.2024\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>Despite extensive studies on large language models and their capability to respond to questions from various licensed exams, there has been limited focus on employing chatbots for specific subjects within the medical curriculum, specifically medical neuroscience. This research compared the performances of Claude 3.5 Sonnet (Anthropic), GPT-3.5, GPT-4-1106 (OpenAI), Copilot free version (Microsoft), and Gemini 1.5 Flash (Google) versus students on MCQs from the medical neuroscience course database to evaluate chatbots reliability. 5 successive attempts of each chatbot to answer 200 USMLE-style questions were evaluated based on accuracy, relevance, and comprehensiveness. MCQs were categorized into 12 categories/topics. The results indicated that at the current level of development, selected AI-driven chatbots, on average, can accurately answer 67.2% of MCQs from the medical neuroscience course, which is 7.4% below the students' average. However, Claude and GPT-4 outperformed other chatbots with 83% and 81.7% correct answers, which is better than the average student result. They followed by Copilot - 59.5%, GPT-3.5 - 58.3%, and Gemini - 53.6%. Concerning different categories, Neurocytology, Embryology, and Diencephalon were the three best topics, with average results of 78.1% - 86.7%, and the lowest results were Brainstem, Special senses, and Cerebellum, with 54.4% - 57.7% correct answers. Our study suggested that Claude and GPT-4 are currently two of the most evolved chatbots. They exhibit proficiency in answering MCQs related to neuroscience that surpasses that of the average medical student. This breakthrough indicates a significant milestone in how AI can supplement and enhance educational tools and techniques.</p>\",\"PeriodicalId\":50852,\"journal\":{\"name\":\"Advances in Physiology Education\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":1.7000,\"publicationDate\":\"2025-01-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Advances in Physiology Education\",\"FirstCategoryId\":\"95\",\"ListUrlMain\":\"https://doi.org/10.1152/advan.00093.2024\",\"RegionNum\":4,\"RegionCategory\":\"教育学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"EDUCATION, SCIENTIFIC DISCIPLINES\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Advances in Physiology Education","FirstCategoryId":"95","ListUrlMain":"https://doi.org/10.1152/advan.00093.2024","RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"EDUCATION, SCIENTIFIC DISCIPLINES","Score":null,"Total":0}
引用次数: 0

摘要

尽管对大型语言模型及其回答各种许可考试问题的能力进行了广泛的研究,但在医学课程中的特定科目(特别是医学神经科学)中使用聊天机器人的关注有限。本研究比较了Claude 3.5 Sonnet (Anthropic)、GPT-3.5、GPT-4-1106 (OpenAI)、Copilot免费版(Microsoft)和Gemini 1.5 Flash(谷歌)与学生在医学神经科学课程数据库mcq上的表现,以评估聊天机器人的可靠性。每个聊天机器人连续5次尝试回答200个usmle风格的问题,并根据准确性、相关性和全面性进行评估。mcq被分为12个类别/主题。结果表明,在目前的发展水平下,选定的ai驱动的聊天机器人平均能准确回答67.2%的医学神经科学课程的mcq,比学生的平均水平低7.4%。然而,Claude和GPT-4分别以83%和81.7%的正确率超越了其他聊天机器人,优于学生的平均成绩。其次是Copilot(59.5%)、GPT-3.5(58.3%)和Gemini(53.6%)。在不同类别中,神经细胞学、胚胎学和间脑是三个最好的题目,平均答对率为78.1% ~ 86.7%,最低的是脑干、特殊感觉和小脑,答对率为54.4% ~ 57.7%。我们的研究表明,Claude和GPT-4是目前最先进的两个聊天机器人。他们在回答与神经科学相关的mcq方面表现出的熟练程度超过了普通医科学生。这一突破标志着人工智能如何补充和增强教育工具和技术的一个重要里程碑。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Claude, ChatGPT, Copilot, and Gemini Performance versus Students in Different Topics of Neuroscience.

Despite extensive studies on large language models and their capability to respond to questions from various licensed exams, there has been limited focus on employing chatbots for specific subjects within the medical curriculum, specifically medical neuroscience. This research compared the performances of Claude 3.5 Sonnet (Anthropic), GPT-3.5, GPT-4-1106 (OpenAI), Copilot free version (Microsoft), and Gemini 1.5 Flash (Google) versus students on MCQs from the medical neuroscience course database to evaluate chatbots reliability. 5 successive attempts of each chatbot to answer 200 USMLE-style questions were evaluated based on accuracy, relevance, and comprehensiveness. MCQs were categorized into 12 categories/topics. The results indicated that at the current level of development, selected AI-driven chatbots, on average, can accurately answer 67.2% of MCQs from the medical neuroscience course, which is 7.4% below the students' average. However, Claude and GPT-4 outperformed other chatbots with 83% and 81.7% correct answers, which is better than the average student result. They followed by Copilot - 59.5%, GPT-3.5 - 58.3%, and Gemini - 53.6%. Concerning different categories, Neurocytology, Embryology, and Diencephalon were the three best topics, with average results of 78.1% - 86.7%, and the lowest results were Brainstem, Special senses, and Cerebellum, with 54.4% - 57.7% correct answers. Our study suggested that Claude and GPT-4 are currently two of the most evolved chatbots. They exhibit proficiency in answering MCQs related to neuroscience that surpasses that of the average medical student. This breakthrough indicates a significant milestone in how AI can supplement and enhance educational tools and techniques.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
CiteScore
3.40
自引率
19.00%
发文量
100
审稿时长
>12 weeks
期刊介绍: Advances in Physiology Education promotes and disseminates educational scholarship in order to enhance teaching and learning of physiology, neuroscience and pathophysiology. The journal publishes peer-reviewed descriptions of innovations that improve teaching in the classroom and laboratory, essays on education, and review articles based on our current understanding of physiological mechanisms. Submissions that evaluate new technologies for teaching and research, and educational pedagogy, are especially welcome. The audience for the journal includes educators at all levels: K–12, undergraduate, graduate, and professional programs.
期刊最新文献
How to get recognition for peer review? Talk less, listen more. Ultrasound technology as a tool to teach basic concepts of physiology and anatomy in undergraduate and graduate courses: a systematic review. A qualitative survey on perception of medical students on the use of large language models for educational purposes. Study while you sleep: using targeted memory reactivation as an independent research project for undergraduates.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1