Evidence-Based Potential of Generative Artificial Intelligence Large Language Models on Dental Avulsion: ChatGPT Versus Gemini.

IF 2.3 3区 医学 Q2 DENTISTRY, ORAL SURGERY & MEDICINE Dental Traumatology Pub Date : 2024-11-02 DOI:10.1111/edt.12999
Taibe Tokgöz Kaplan, Muhammet Cankar
{"title":"Evidence-Based Potential of Generative Artificial Intelligence Large Language Models on Dental Avulsion: ChatGPT Versus Gemini.","authors":"Taibe Tokgöz Kaplan, Muhammet Cankar","doi":"10.1111/edt.12999","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>In this study, the accuracy and comprehensiveness of the answers given to questions about dental avulsion by two artificial intelligence-based language models, ChatGPT and Gemini, were comparatively evaluated.</p><p><strong>Materials and methods: </strong>Based on the guidelines of the International Society of Dental Traumatology, a total of 33 questions were prepared, including multiple-choice questions, binary questions, and open-ended questions as technical questions and patient questions about dental avulsion. They were directed to ChatGPT and Gemini. Responses were recorded and scored by four pediatric dentists. Statistical analyses, including ICC analysis, were performed to determine the agreement and accuracy of the responses. The significance level was set as p < 0.050.</p><p><strong>Results: </strong>The mean score of the Gemini model was statistically significantly higher than the ChatGPT (p = 0.001). ChatGPT gave more correct answers to open-ended questions and T/F questions on dental avulsion; it showed the lowest accuracy in the MCQ section. There was no significant difference between the responses of the Gemini model to different types of questions on dental avulsion and the median scores (p = 0.088). ChatGPT and Gemini were analyzed with the Mann-Whitney U test without making a distinction between question types, and Gemini answers were found to be statistically significantly more accurate (p = 0.004).</p><p><strong>Conclusions: </strong>The Gemini and ChatGPT language models based on the IADT guideline for dental avulsion undoubtedly show promise. To guarantee the successful incorporation of LLMs into practice, it is imperative to conduct additional research, clinical validation, and improvements to the models.</p>","PeriodicalId":55180,"journal":{"name":"Dental Traumatology","volume":" ","pages":""},"PeriodicalIF":2.3000,"publicationDate":"2024-11-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Dental Traumatology","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1111/edt.12999","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"DENTISTRY, ORAL SURGERY & MEDICINE","Score":null,"Total":0}
引用次数: 0

Abstract

Background: In this study, the accuracy and comprehensiveness of the answers given to questions about dental avulsion by two artificial intelligence-based language models, ChatGPT and Gemini, were comparatively evaluated.

Materials and methods: Based on the guidelines of the International Society of Dental Traumatology, a total of 33 questions were prepared, including multiple-choice questions, binary questions, and open-ended questions as technical questions and patient questions about dental avulsion. They were directed to ChatGPT and Gemini. Responses were recorded and scored by four pediatric dentists. Statistical analyses, including ICC analysis, were performed to determine the agreement and accuracy of the responses. The significance level was set as p < 0.050.

Results: The mean score of the Gemini model was statistically significantly higher than the ChatGPT (p = 0.001). ChatGPT gave more correct answers to open-ended questions and T/F questions on dental avulsion; it showed the lowest accuracy in the MCQ section. There was no significant difference between the responses of the Gemini model to different types of questions on dental avulsion and the median scores (p = 0.088). ChatGPT and Gemini were analyzed with the Mann-Whitney U test without making a distinction between question types, and Gemini answers were found to be statistically significantly more accurate (p = 0.004).

Conclusions: The Gemini and ChatGPT language models based on the IADT guideline for dental avulsion undoubtedly show promise. To guarantee the successful incorporation of LLMs into practice, it is imperative to conduct additional research, clinical validation, and improvements to the models.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于证据的生成式人工智能大语言模型在牙齿脱落方面的潜力:ChatGPT Versus Gemini.
背景:在这项研究中,对两种基于人工智能的语言模型--ChatGPT和Gemini--回答有关牙外伤问题的准确性和全面性进行了比较评估:根据国际牙科创伤学会的指导方针,共准备了 33 个问题,包括多选题、二元题和开放式问题,作为有关牙齿脱落的技术问题和患者问题。这些问题都指向 ChatGPT 和 Gemini。由四位儿童牙科医生对回答进行记录和评分。为确定回答的一致性和准确性,进行了包括 ICC 分析在内的统计分析。显著性水平设定为 p 结果:Gemini 模型的平均得分在统计学上明显高于 ChatGPT(p = 0.001)。ChatGPT 对有关牙齿脱落的开放式问题和 T/F 问题给出了更多正确答案;在 MCQ 部分的准确率最低。Gemini 模型对有关牙齿脱落的不同类型问题的回答与中位分数之间没有明显差异(p = 0.088)。在不区分问题类型的情况下,采用曼-惠特尼 U 检验对 ChatGPT 和 Gemini 进行了分析,结果发现 Gemini 的答案在统计学上明显更准确(p = 0.004):基于 IADT 指导原则的 Gemini 和 ChatGPT 语言模型无疑显示出良好的前景。为确保将 LLMs 成功应用于实践,必须进行更多的研究、临床验证并对模型进行改进。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Dental Traumatology
Dental Traumatology 医学-牙科与口腔外科
CiteScore
6.40
自引率
32.00%
发文量
85
审稿时长
6-12 weeks
期刊介绍: Dental Traumatology is an international journal that aims to convey scientific and clinical progress in all areas related to adult and pediatric dental traumatology. This includes the following topics: - Epidemiology, Social Aspects, Education, Diagnostics - Esthetics / Prosthetics/ Restorative - Evidence Based Traumatology & Study Design - Oral & Maxillofacial Surgery/Transplant/Implant - Pediatrics and Orthodontics - Prevention and Sports Dentistry - Endodontics and Periodontal Aspects The journal"s aim is to promote communication among clinicians, educators, researchers, and others interested in the field of dental traumatology.
期刊最新文献
Prevalence and Pattern of Traumatic Dental Injuries in Children and Adolescents With Severe to Profound Hearing Impairment: A Hospital-Based Cross-Sectional Study. Comparative Effectiveness of Regenerative Endodontic Treatment Versus Apexification for Necrotic Immature Permanent Teeth With or Without Apical Periodontitis: An Umbrella Review. Urban Violence and Maxillofacial Trauma: Sex Differences in a Cross-Sectional Study From Belo Horizonte, Brazil. The Calcium Hydroxide Controversy: Does Calcium Hydroxide Weaken Teeth? The Impact of Traumatic Dental Injury on the Oral Health-Related Quality of Life of Preschool Children: A Cross-Sectional Study.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1