ChatGPT在葡萄牙医学考题中的表现:ChatGPT-3.5 Turbo与ChatGPT- 40 Mini的对比分析

IF 3.2 Q1 EDUCATION, SCIENTIFIC DISCIPLINES JMIR Medical Education Pub Date : 2025-03-05 DOI:10.2196/65108
Filipe Prazeres
{"title":"ChatGPT在葡萄牙医学考题中的表现:ChatGPT-3.5 Turbo与ChatGPT- 40 Mini的对比分析","authors":"Filipe Prazeres","doi":"10.2196/65108","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Advancements in ChatGPT are transforming medical education by providing new tools for assessment and learning, potentially enhancing evaluations for doctors and improving instructional effectiveness.</p><p><strong>Objective: </strong>This study evaluates the performance and consistency of ChatGPT-3.5 Turbo and ChatGPT-4o mini in solving European Portuguese medical examination questions (2023 National Examination for Access to Specialized Training; Prova Nacional de Acesso à Formação Especializada [PNA]) and compares their performance to human candidates.</p><p><strong>Methods: </strong>ChatGPT-3.5 Turbo was tested on the first part of the examination (74 questions) on July 18, 2024, and ChatGPT-4o mini on the second part (74 questions) on July 19, 2024. Each model generated an answer using its natural language processing capabilities. To test consistency, each model was asked, \"Are you sure?\" after providing an answer. Differences between the first and second responses of each model were analyzed using the McNemar test with continuity correction. A single-parameter t test compared the models' performance to human candidates. Frequencies and percentages were used for categorical variables, and means and CIs for numerical variables. Statistical significance was set at P<.05.</p><p><strong>Results: </strong>ChatGPT-4o mini achieved an accuracy rate of 65% (48/74) on the 2023 PNA examination, surpassing ChatGPT-3.5 Turbo. ChatGPT-4o mini outperformed medical candidates, while ChatGPT-3.5 Turbo had a more moderate performance.</p><p><strong>Conclusions: </strong>This study highlights the advancements and potential of ChatGPT models in medical education, emphasizing the need for careful implementation with teacher oversight and further research.</p>","PeriodicalId":36236,"journal":{"name":"JMIR Medical Education","volume":"11 ","pages":"e65108"},"PeriodicalIF":3.2000,"publicationDate":"2025-03-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11902880/pdf/","citationCount":"0","resultStr":"{\"title\":\"ChatGPT's Performance on Portuguese Medical Examination Questions: Comparative Analysis of ChatGPT-3.5 Turbo and ChatGPT-4o Mini.\",\"authors\":\"Filipe Prazeres\",\"doi\":\"10.2196/65108\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Background: </strong>Advancements in ChatGPT are transforming medical education by providing new tools for assessment and learning, potentially enhancing evaluations for doctors and improving instructional effectiveness.</p><p><strong>Objective: </strong>This study evaluates the performance and consistency of ChatGPT-3.5 Turbo and ChatGPT-4o mini in solving European Portuguese medical examination questions (2023 National Examination for Access to Specialized Training; Prova Nacional de Acesso à Formação Especializada [PNA]) and compares their performance to human candidates.</p><p><strong>Methods: </strong>ChatGPT-3.5 Turbo was tested on the first part of the examination (74 questions) on July 18, 2024, and ChatGPT-4o mini on the second part (74 questions) on July 19, 2024. Each model generated an answer using its natural language processing capabilities. To test consistency, each model was asked, \\\"Are you sure?\\\" after providing an answer. Differences between the first and second responses of each model were analyzed using the McNemar test with continuity correction. A single-parameter t test compared the models' performance to human candidates. Frequencies and percentages were used for categorical variables, and means and CIs for numerical variables. Statistical significance was set at P<.05.</p><p><strong>Results: </strong>ChatGPT-4o mini achieved an accuracy rate of 65% (48/74) on the 2023 PNA examination, surpassing ChatGPT-3.5 Turbo. ChatGPT-4o mini outperformed medical candidates, while ChatGPT-3.5 Turbo had a more moderate performance.</p><p><strong>Conclusions: </strong>This study highlights the advancements and potential of ChatGPT models in medical education, emphasizing the need for careful implementation with teacher oversight and further research.</p>\",\"PeriodicalId\":36236,\"journal\":{\"name\":\"JMIR Medical Education\",\"volume\":\"11 \",\"pages\":\"e65108\"},\"PeriodicalIF\":3.2000,\"publicationDate\":\"2025-03-05\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11902880/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"JMIR Medical Education\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.2196/65108\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"EDUCATION, SCIENTIFIC DISCIPLINES\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"JMIR Medical Education","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2196/65108","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"EDUCATION, SCIENTIFIC DISCIPLINES","Score":null,"Total":0}
引用次数: 0

摘要

背景:ChatGPT的进步正在通过提供新的评估和学习工具来改变医学教育,有可能加强对医生的评估并提高教学效率。目的:评价ChatGPT-3.5 Turbo和chatgpt - 40 mini在解决欧洲葡萄牙医学考题(2023年全国专科培训资格考试;(PNA),并将它们的表现与人类候选人进行比较。方法:ChatGPT-3.5 Turbo于2024年7月18日参加第一部分考试(74题),chatgpt - 40 mini于2024年7月19日参加第二部分考试(74题)。每个模型都使用其自然语言处理能力生成答案。为了测试一致性,每个模型在给出答案后都会被问到:“你确定吗?”每个模型的第一和第二反应之间的差异分析使用连续性校正McNemar检验。单参数t检验比较了模型与人类候选人的表现。频率和百分比用于分类变量,均值和ci用于数值变量。结果显示,chatgpt - 40 mini在2023 PNA检查中达到65%(48/74)的准确率,超过ChatGPT-3.5 Turbo。chatgpt - 40 mini的表现优于医疗候选人,而ChatGPT-3.5 Turbo的表现更为温和。结论:本研究强调了ChatGPT模型在医学教育中的进步和潜力,强调了在教师监督下仔细实施和进一步研究的必要性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
ChatGPT's Performance on Portuguese Medical Examination Questions: Comparative Analysis of ChatGPT-3.5 Turbo and ChatGPT-4o Mini.

Background: Advancements in ChatGPT are transforming medical education by providing new tools for assessment and learning, potentially enhancing evaluations for doctors and improving instructional effectiveness.

Objective: This study evaluates the performance and consistency of ChatGPT-3.5 Turbo and ChatGPT-4o mini in solving European Portuguese medical examination questions (2023 National Examination for Access to Specialized Training; Prova Nacional de Acesso à Formação Especializada [PNA]) and compares their performance to human candidates.

Methods: ChatGPT-3.5 Turbo was tested on the first part of the examination (74 questions) on July 18, 2024, and ChatGPT-4o mini on the second part (74 questions) on July 19, 2024. Each model generated an answer using its natural language processing capabilities. To test consistency, each model was asked, "Are you sure?" after providing an answer. Differences between the first and second responses of each model were analyzed using the McNemar test with continuity correction. A single-parameter t test compared the models' performance to human candidates. Frequencies and percentages were used for categorical variables, and means and CIs for numerical variables. Statistical significance was set at P<.05.

Results: ChatGPT-4o mini achieved an accuracy rate of 65% (48/74) on the 2023 PNA examination, surpassing ChatGPT-3.5 Turbo. ChatGPT-4o mini outperformed medical candidates, while ChatGPT-3.5 Turbo had a more moderate performance.

Conclusions: This study highlights the advancements and potential of ChatGPT models in medical education, emphasizing the need for careful implementation with teacher oversight and further research.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
JMIR Medical Education
JMIR Medical Education Social Sciences-Education
CiteScore
6.90
自引率
5.60%
发文量
54
审稿时长
8 weeks
期刊最新文献
Authors' Reply: Big Data, Small Stories: Methodological Considerations for Using Social Media Analytics in Medical Education Research. Big Data, Small Stories: Methodological Considerations for Using Social Media Analytics in Medical Education Research. Students' Perspectives on Digital Psychotherapy-Possible Solutions for Digital Inpatient-Like Care Concepts: Qualitative Interview Study. The Historical Evolution of Microlearning in Health Professions Education: Bibliometric Analysis. Performance of Large Language Models on the Brazilian National Medical Education Examination: Comparative Benchmark Study.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1