Performance of artificial intelligence on Turkish dental specialization exam: can ChatGPT-4.0 and gemini advanced achieve comparable results to humans?

IF 2.7 2区 医学 Q1 EDUCATION & EDUCATIONAL RESEARCH BMC Medical Education Pub Date : 2025-02-10 DOI:10.1186/s12909-024-06389-9
Soner Sismanoglu, Belen Sirinoglu Capan
{"title":"Performance of artificial intelligence on Turkish dental specialization exam: can ChatGPT-4.0 and gemini advanced achieve comparable results to humans?","authors":"Soner Sismanoglu, Belen Sirinoglu Capan","doi":"10.1186/s12909-024-06389-9","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>AI-powered chatbots have spread to various fields including dental education and clinical assistance to treatment planning. The aim of this study is to assess and compare leading AI-powered chatbot performances in dental specialization exam (DUS) administered in Turkey and compare it with the best performer of that year.</p><p><strong>Methods: </strong>DUS questions for 2020 and 2021 were directed to ChatGPT-4.0 and Gemini Advanced individually. DUS questions were manually entered into AI-powered chatbot in their original form, in Turkish. The results obtained were compared with each other and the year's best performers. Candidates who score at least 45 points on this centralized exam are deemed to have passed and are eligible to select their preferred department and institution. The data was statistically analyzed using Pearson's chi-squared test (p < 0.05).</p><p><strong>Results: </strong>ChatGPT-4.0 received 83.3% correct response rate on the 2020 exam, while Gemini Advanced received 65% correct response rate. On the 2021 exam, ChatGPT-4.0 received 80.5% correct response rate, whereas Gemini Advanced received 60.2% correct response rate. ChatGPT-4.0 outperformed Gemini Advanced in both exams (p < 0.05). AI-powered chatbots performed worse in overall score (for 2020: ChatGPT-4.0, 65,5 and Gemini Advanced, 50.1; for 2021: ChatGPT-4.0, 65,6 and Gemini Advanced, 48.6) when compared to overall scores of the best performer of that year (68.5 points for year 2020 and 72.3 points for year 2021). This poor performance also includes the basic sciences and clinical sciences sections (p < 0.001). Additionally, periodontology was the clinical specialty in which both AI-powered chatbots achieved the best results, the lowest performance was determined in the endodontics and orthodontics.</p><p><strong>Conclusion: </strong>AI-powered chatbots, namely ChatGPT-4.0 and Gemini Advanced, passed the DUS by exceeding the threshold score of 45. However, they still lagged behind the top performers of that year, particularly in basic sciences, clinical sciences, and overall score. Additionally, they exhibited lower performance in some clinical specialties such as endodontics and orthodontics.</p>","PeriodicalId":51234,"journal":{"name":"BMC Medical Education","volume":"25 1","pages":"214"},"PeriodicalIF":2.7000,"publicationDate":"2025-02-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"BMC Medical Education","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1186/s12909-024-06389-9","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"EDUCATION & EDUCATIONAL RESEARCH","Score":null,"Total":0}
引用次数: 0

Abstract

Background: AI-powered chatbots have spread to various fields including dental education and clinical assistance to treatment planning. The aim of this study is to assess and compare leading AI-powered chatbot performances in dental specialization exam (DUS) administered in Turkey and compare it with the best performer of that year.

Methods: DUS questions for 2020 and 2021 were directed to ChatGPT-4.0 and Gemini Advanced individually. DUS questions were manually entered into AI-powered chatbot in their original form, in Turkish. The results obtained were compared with each other and the year's best performers. Candidates who score at least 45 points on this centralized exam are deemed to have passed and are eligible to select their preferred department and institution. The data was statistically analyzed using Pearson's chi-squared test (p < 0.05).

Results: ChatGPT-4.0 received 83.3% correct response rate on the 2020 exam, while Gemini Advanced received 65% correct response rate. On the 2021 exam, ChatGPT-4.0 received 80.5% correct response rate, whereas Gemini Advanced received 60.2% correct response rate. ChatGPT-4.0 outperformed Gemini Advanced in both exams (p < 0.05). AI-powered chatbots performed worse in overall score (for 2020: ChatGPT-4.0, 65,5 and Gemini Advanced, 50.1; for 2021: ChatGPT-4.0, 65,6 and Gemini Advanced, 48.6) when compared to overall scores of the best performer of that year (68.5 points for year 2020 and 72.3 points for year 2021). This poor performance also includes the basic sciences and clinical sciences sections (p < 0.001). Additionally, periodontology was the clinical specialty in which both AI-powered chatbots achieved the best results, the lowest performance was determined in the endodontics and orthodontics.

Conclusion: AI-powered chatbots, namely ChatGPT-4.0 and Gemini Advanced, passed the DUS by exceeding the threshold score of 45. However, they still lagged behind the top performers of that year, particularly in basic sciences, clinical sciences, and overall score. Additionally, they exhibited lower performance in some clinical specialties such as endodontics and orthodontics.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
求助全文
约1分钟内获得全文 去求助
来源期刊
BMC Medical Education
BMC Medical Education EDUCATION, SCIENTIFIC DISCIPLINES-
CiteScore
4.90
自引率
11.10%
发文量
795
审稿时长
6 months
期刊介绍: BMC Medical Education is an open access journal publishing original peer-reviewed research articles in relation to the training of healthcare professionals, including undergraduate, postgraduate, and continuing education. The journal has a special focus on curriculum development, evaluations of performance, assessment of training needs and evidence-based medicine.
期刊最新文献
Factors influencing quality of professional life and perceived bullying among medical residents in Iran. How can we enhance 'real-time' patient involvement in medical education? A qualitative study of patients and students. Performance of artificial intelligence on Turkish dental specialization exam: can ChatGPT-4.0 and gemini advanced achieve comparable results to humans? Remote online learning reimagined: perceptions and experiences of medical students in a post-pandemic world. The impact of trauma-informed training on pediatricians' perceived-competence in treating war-affected children: a pilot study.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1