Performance of artificial intelligence on Turkish dental specialization exam: can ChatGPT-4.0 and gemini advanced achieve comparable results to humans?

IF 3.2 2区 医学 Q1 EDUCATION & EDUCATIONAL RESEARCH BMC Medical Education Pub Date : 2025-02-10 DOI:10.1186/s12909-024-06389-9
Soner Sismanoglu, Belen Sirinoglu Capan
{"title":"Performance of artificial intelligence on Turkish dental specialization exam: can ChatGPT-4.0 and gemini advanced achieve comparable results to humans?","authors":"Soner Sismanoglu, Belen Sirinoglu Capan","doi":"10.1186/s12909-024-06389-9","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>AI-powered chatbots have spread to various fields including dental education and clinical assistance to treatment planning. The aim of this study is to assess and compare leading AI-powered chatbot performances in dental specialization exam (DUS) administered in Turkey and compare it with the best performer of that year.</p><p><strong>Methods: </strong>DUS questions for 2020 and 2021 were directed to ChatGPT-4.0 and Gemini Advanced individually. DUS questions were manually entered into AI-powered chatbot in their original form, in Turkish. The results obtained were compared with each other and the year's best performers. Candidates who score at least 45 points on this centralized exam are deemed to have passed and are eligible to select their preferred department and institution. The data was statistically analyzed using Pearson's chi-squared test (p < 0.05).</p><p><strong>Results: </strong>ChatGPT-4.0 received 83.3% correct response rate on the 2020 exam, while Gemini Advanced received 65% correct response rate. On the 2021 exam, ChatGPT-4.0 received 80.5% correct response rate, whereas Gemini Advanced received 60.2% correct response rate. ChatGPT-4.0 outperformed Gemini Advanced in both exams (p < 0.05). AI-powered chatbots performed worse in overall score (for 2020: ChatGPT-4.0, 65,5 and Gemini Advanced, 50.1; for 2021: ChatGPT-4.0, 65,6 and Gemini Advanced, 48.6) when compared to overall scores of the best performer of that year (68.5 points for year 2020 and 72.3 points for year 2021). This poor performance also includes the basic sciences and clinical sciences sections (p < 0.001). Additionally, periodontology was the clinical specialty in which both AI-powered chatbots achieved the best results, the lowest performance was determined in the endodontics and orthodontics.</p><p><strong>Conclusion: </strong>AI-powered chatbots, namely ChatGPT-4.0 and Gemini Advanced, passed the DUS by exceeding the threshold score of 45. However, they still lagged behind the top performers of that year, particularly in basic sciences, clinical sciences, and overall score. Additionally, they exhibited lower performance in some clinical specialties such as endodontics and orthodontics.</p>","PeriodicalId":51234,"journal":{"name":"BMC Medical Education","volume":"25 1","pages":"214"},"PeriodicalIF":3.2000,"publicationDate":"2025-02-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11809121/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"BMC Medical Education","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1186/s12909-024-06389-9","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"EDUCATION & EDUCATIONAL RESEARCH","Score":null,"Total":0}
引用次数: 0

Abstract

Background: AI-powered chatbots have spread to various fields including dental education and clinical assistance to treatment planning. The aim of this study is to assess and compare leading AI-powered chatbot performances in dental specialization exam (DUS) administered in Turkey and compare it with the best performer of that year.

Methods: DUS questions for 2020 and 2021 were directed to ChatGPT-4.0 and Gemini Advanced individually. DUS questions were manually entered into AI-powered chatbot in their original form, in Turkish. The results obtained were compared with each other and the year's best performers. Candidates who score at least 45 points on this centralized exam are deemed to have passed and are eligible to select their preferred department and institution. The data was statistically analyzed using Pearson's chi-squared test (p < 0.05).

Results: ChatGPT-4.0 received 83.3% correct response rate on the 2020 exam, while Gemini Advanced received 65% correct response rate. On the 2021 exam, ChatGPT-4.0 received 80.5% correct response rate, whereas Gemini Advanced received 60.2% correct response rate. ChatGPT-4.0 outperformed Gemini Advanced in both exams (p < 0.05). AI-powered chatbots performed worse in overall score (for 2020: ChatGPT-4.0, 65,5 and Gemini Advanced, 50.1; for 2021: ChatGPT-4.0, 65,6 and Gemini Advanced, 48.6) when compared to overall scores of the best performer of that year (68.5 points for year 2020 and 72.3 points for year 2021). This poor performance also includes the basic sciences and clinical sciences sections (p < 0.001). Additionally, periodontology was the clinical specialty in which both AI-powered chatbots achieved the best results, the lowest performance was determined in the endodontics and orthodontics.

Conclusion: AI-powered chatbots, namely ChatGPT-4.0 and Gemini Advanced, passed the DUS by exceeding the threshold score of 45. However, they still lagged behind the top performers of that year, particularly in basic sciences, clinical sciences, and overall score. Additionally, they exhibited lower performance in some clinical specialties such as endodontics and orthodontics.

Abstract Image

Abstract Image

Abstract Image

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
人工智能在土耳其牙科专业考试中的表现:ChatGPT-4.0和gemini advanced能否达到与人类相当的成绩?
背景:人工智能聊天机器人已经扩展到各个领域,包括牙科教育和临床辅助治疗计划。本研究的目的是评估和比较在土耳其进行的牙科专业考试(DUS)中领先的人工智能聊天机器人的表现,并将其与当年表现最好的机器人进行比较。方法:2020年和2021年的DUS问题分别针对ChatGPT-4.0和Gemini Advanced。DUS的问题以土耳其语的原始形式手动输入到人工智能聊天机器人中。得到的结果相互比较和年度最佳表现。考生在此集中考试中得分不低于45分,即被视为通过,并有资格选择自己喜欢的专业和机构。使用Pearson卡方检验对数据进行统计分析(p)结果:ChatGPT-4.0在2020年考试中获得83.3%的正确率,而Gemini Advanced获得65%的正确率。在2021年的考试中,ChatGPT-4.0的正确率为80.5%,而Gemini Advanced的正确率为60.2%。结论:人工智能聊天机器人,即ChatGPT-4.0和Gemini Advanced,以超过45分的阈值通过了DUS。然而,他们仍然落后于当年表现最好的学生,特别是在基础科学、临床科学和总分方面。此外,他们在一些临床专业如牙髓学和正畸学方面表现较差。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
BMC Medical Education
BMC Medical Education EDUCATION, SCIENTIFIC DISCIPLINES-
CiteScore
4.90
自引率
11.10%
发文量
795
审稿时长
6 months
期刊介绍: BMC Medical Education is an open access journal publishing original peer-reviewed research articles in relation to the training of healthcare professionals, including undergraduate, postgraduate, and continuing education. The journal has a special focus on curriculum development, evaluations of performance, assessment of training needs and evidence-based medicine.
期刊最新文献
Factors associated with job finding anxiety among senior health sciences students: the role of attitudes toward artificial intelligence. Identifying informal leaders among medical residents as a basis for educational interventions. Factors associated with academic resilience in nursing students: the role of individual, academic, and social factors. Educational benefits of pre-class videos and viewing perspectives on ultrasound-guided central venous catheterization training: a prospective randomized controlled trial. Convergent thinking, divergent thinking, and openness to experience as predictors of academic success among high-achieving and typical-achieving medical students.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1