Evaluation of the performance of ChatGPT-4 and ChatGPT-4o as a learning tool in endodontics

IF 6.4 1区 医学 Q1 DENTISTRY, ORAL SURGERY & MEDICINE International endodontic journal Pub Date : 2026-05-10 Epub Date: 2025-03-02 DOI:10.1111/iej.14217
Esra Arılı Öztürk, Ceren Turan Gökduman, Burhan Can Çanakçi
{"title":"Evaluation of the performance of ChatGPT-4 and ChatGPT-4o as a learning tool in endodontics","authors":"Esra Arılı Öztürk,&nbsp;Ceren Turan Gökduman,&nbsp;Burhan Can Çanakçi","doi":"10.1111/iej.14217","DOIUrl":null,"url":null,"abstract":"<div>\n \n \n <section>\n \n <h3> Aims</h3>\n \n <p>The aim of this study was to evaluate the accuracy and consistency of responses given by two different versions of Chat Generative Pre-trained Transformer (ChatGPT), ChatGPT-4, and ChatGPT-4o, to multiple-choice questions prepared from undergraduate endodontic education topics at different times of the day and on different days.</p>\n </section>\n \n <section>\n \n <h3> Methodology</h3>\n \n <p>In total, 60 multiple-choice, text-based questions from 6 topics of undergraduate endodontic education were prepared. Each question was asked to ChatGPT-4 and ChatGPT-4o 3 times a day (morning, noon, and evening) and for 3 consecutive days. The accuracy and consistency of AIs were compared using SPSS and R programs (<i>p</i> &lt; .05, 95% confidence interval).</p>\n </section>\n \n <section>\n \n <h3> Results</h3>\n \n <p>The accuracy rate of ChatGPT-4o (92.8%) was significantly higher than that of ChatGPT-4 (81.7%; <i>p</i> &lt; .001). The question groups affected the accuracy rates of both AIs (<i>p</i> &lt; .001). The times at which the questions were asked did not affect the accuracy of either AI (<i>p</i> &gt; .05). There was no statistically significant difference in the consistency rate between ChatGPT-4 and ChatGPT-4o (<i>p</i> = .123). The question groups did not affect the consistency of either AI, too (<i>p</i> &gt; .05).</p>\n </section>\n \n <section>\n \n <h3> Conclusions</h3>\n \n <p>According to the results of this study, the accuracy of ChatGPT-4o was better than that of ChatGPT-4. These findings demonstrate that AI chatbots can be used in dental education. However, it is also necessary to consider the limitations and potential risks associated with AI.</p>\n </section>\n </div>","PeriodicalId":13724,"journal":{"name":"International endodontic journal","volume":"59 6","pages":"1057-1069"},"PeriodicalIF":6.4000,"publicationDate":"2026-05-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC13158531/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International endodontic journal","FirstCategoryId":"3","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1111/iej.14217","RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/3/2 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"DENTISTRY, ORAL SURGERY & MEDICINE","Score":null,"Total":0}
引用次数: 0

Abstract

Aims

The aim of this study was to evaluate the accuracy and consistency of responses given by two different versions of Chat Generative Pre-trained Transformer (ChatGPT), ChatGPT-4, and ChatGPT-4o, to multiple-choice questions prepared from undergraduate endodontic education topics at different times of the day and on different days.

Methodology

In total, 60 multiple-choice, text-based questions from 6 topics of undergraduate endodontic education were prepared. Each question was asked to ChatGPT-4 and ChatGPT-4o 3 times a day (morning, noon, and evening) and for 3 consecutive days. The accuracy and consistency of AIs were compared using SPSS and R programs (p < .05, 95% confidence interval).

Results

The accuracy rate of ChatGPT-4o (92.8%) was significantly higher than that of ChatGPT-4 (81.7%; p < .001). The question groups affected the accuracy rates of both AIs (p < .001). The times at which the questions were asked did not affect the accuracy of either AI (p > .05). There was no statistically significant difference in the consistency rate between ChatGPT-4 and ChatGPT-4o (p = .123). The question groups did not affect the consistency of either AI, too (p > .05).

Conclusions

According to the results of this study, the accuracy of ChatGPT-4o was better than that of ChatGPT-4. These findings demonstrate that AI chatbots can be used in dental education. However, it is also necessary to consider the limitations and potential risks associated with AI.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
ChatGPT-4和chatgpt - 40作为牙髓学学习工具的性能评价。
目的:本研究的目的是评估两个不同版本的聊天生成预训练转换器(ChatGPT), ChatGPT-4和ChatGPT- 40在一天中的不同时间和不同日期对本科生牙髓教育主题的多项选择题的回答的准确性和一致性。方法:从本科牙髓学教育的6个主题中选取60道选择题,以文本为基础。每个问题每天向ChatGPT-4和chatgpt - 40询问3次(早、中、晚),连续3天。结果:chatgpt - 40的准确率(92.8%)显著高于ChatGPT-4的准确率(81.7%);p . 05)。ChatGPT-4与chatgpt - 40的符合率差异无统计学意义(p = .123)。问题组也不影响任何一种人工智能的一致性(p < 0.05)。结论:根据本研究结果,chatgpt - 40的准确性优于ChatGPT-4。这些发现表明,人工智能聊天机器人可以用于牙科教育。然而,也有必要考虑与人工智能相关的局限性和潜在风险。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
International endodontic journal
International endodontic journal 医学-牙科与口腔外科
CiteScore
10.20
自引率
28.00%
发文量
195
审稿时长
4-8 weeks
期刊介绍: The International Endodontic Journal is published monthly and strives to publish original articles of the highest quality to disseminate scientific and clinical knowledge; all manuscripts are subjected to peer review. Original scientific articles are published in the areas of biomedical science, applied materials science, bioengineering, epidemiology and social science relevant to endodontic disease and its management, and to the restoration of root-treated teeth. In addition, review articles, reports of clinical cases, book reviews, summaries and abstracts of scientific meetings and news items are accepted. The International Endodontic Journal is essential reading for general dental practitioners, specialist endodontists, research, scientists and dental teachers.
期刊最新文献
Chronic Stress Predictability Modulates the Severity of Apical Periodontitis Induced in Wistar Rats. CARES-A New Three-Dimensional Coding System for Classifying Teeth With Dens Invaginatus. A Novel piRNA-Mediated Epigenetic Axis: piR-36241 Exacerbates Pulpitis by Silencing the Protective Receptor ADGRG2 in Human Dental Pulp Stem Cells. Spatial Transcriptomics Delineates the Inflammatory Landscape of Human Dental Pulp: Regional Crosstalk and Therapeutic Implications. Endodontic Practice Patterns Across Practitioner Profiles in France: A Cross-Sectional Survey.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1