Performance of Artificial Intelligence Chatbots in Responding to Patient Queries Related to Traumatic Dental Injuries: A Comparative Study.

IF 2.3 3区医学 Q2 DENTISTRY, ORAL SURGERY & MEDICINE Dental Traumatology Pub Date : 2024-11-22 DOI:10.1111/edt.13020

Yeliz Guven, Omer Tarik Ozdemir, Melis Yazir Kavan

{"title":"Performance of Artificial Intelligence Chatbots in Responding to Patient Queries Related to Traumatic Dental Injuries: A Comparative Study.","authors":"Yeliz Guven, Omer Tarik Ozdemir, Melis Yazir Kavan","doi":"10.1111/edt.13020","DOIUrl":null,"url":null,"abstract":"Background/aim: Artificial intelligence (AI) chatbots have become increasingly prevalent in recent years as potential sources of online healthcare information for patients when making medical/dental decisions. This study assessed the readability, quality, and accuracy of responses provided by three AI chatbots to questions related to traumatic dental injuries (TDIs), either retrieved from popular question-answer sites or manually created based on the hypothetical case scenarios.Materials and methods: A total of 59 traumatic injury queries were directed at ChatGPT 3.5, ChatGPT 4.0, and Google Gemini. Readability was evaluated using the Flesch Reading Ease (FRE) and Flesch-Kincaid Grade Level (FKGL) scores. To assess response quality and accuracy, the DISCERN tool, Global Quality Score (GQS), and misinformation scores were used. The understandability and actionability of the responses were analyzed using the Patient Education Materials Assessment Tool for Printed Materials (PEMAT-P) tool. Statistical analysis included Kruskal-Wallis with Dunn's post hoc test for non-normal variables, and one-way ANOVA with Tukey's post hoc test for normal variables (p < 0.05).Results: The mean FKGL and FRE scores for ChatGPT 3.5, ChatGPT 4.0, and Google Gemini were 11.2 and 49.25, 11.8 and 46.42, and 10.1 and 51.91, respectively, indicating that the responses were difficult to read and required a college-level reading ability. ChatGPT 3.5 had the lowest DISCERN and PEMAT-P understandability scores among the chatbots (p < 0.001). ChatGPT 4.0 and Google Gemini were rated higher for quality (GQS score of 5) compared to ChatGPT 3.5 (p < 0.001).Conclusions: In this study, ChatGPT 3.5, although widely used, provided some misleading and inaccurate responses to questions about TDIs. In contrast, ChatGPT 4.0 and Google Gemini generated more accurate and comprehensive answers, making them more reliable as auxiliary information sources. However, for complex issues like TDIs, no chatbot can replace a dentist for diagnosis, treatment, and follow-up care.","PeriodicalId":55180,"journal":{"name":"Dental Traumatology","volume":" ","pages":""},"PeriodicalIF":2.3000,"publicationDate":"2024-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Dental Traumatology","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1111/edt.13020","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"DENTISTRY, ORAL SURGERY & MEDICINE","Score":null,"Total":0}

引用次数: 0

Abstract

Background/aim: Artificial intelligence (AI) chatbots have become increasingly prevalent in recent years as potential sources of online healthcare information for patients when making medical/dental decisions. This study assessed the readability, quality, and accuracy of responses provided by three AI chatbots to questions related to traumatic dental injuries (TDIs), either retrieved from popular question-answer sites or manually created based on the hypothetical case scenarios.

Materials and methods: A total of 59 traumatic injury queries were directed at ChatGPT 3.5, ChatGPT 4.0, and Google Gemini. Readability was evaluated using the Flesch Reading Ease (FRE) and Flesch-Kincaid Grade Level (FKGL) scores. To assess response quality and accuracy, the DISCERN tool, Global Quality Score (GQS), and misinformation scores were used. The understandability and actionability of the responses were analyzed using the Patient Education Materials Assessment Tool for Printed Materials (PEMAT-P) tool. Statistical analysis included Kruskal-Wallis with Dunn's post hoc test for non-normal variables, and one-way ANOVA with Tukey's post hoc test for normal variables (p < 0.05).

Results: The mean FKGL and FRE scores for ChatGPT 3.5, ChatGPT 4.0, and Google Gemini were 11.2 and 49.25, 11.8 and 46.42, and 10.1 and 51.91, respectively, indicating that the responses were difficult to read and required a college-level reading ability. ChatGPT 3.5 had the lowest DISCERN and PEMAT-P understandability scores among the chatbots (p < 0.001). ChatGPT 4.0 and Google Gemini were rated higher for quality (GQS score of 5) compared to ChatGPT 3.5 (p < 0.001).

Conclusions: In this study, ChatGPT 3.5, although widely used, provided some misleading and inaccurate responses to questions about TDIs. In contrast, ChatGPT 4.0 and Google Gemini generated more accurate and comprehensive answers, making them more reliable as auxiliary information sources. However, for complex issues like TDIs, no chatbot can replace a dentist for diagnosis, treatment, and follow-up care.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

人工智能聊天机器人在回复牙外伤相关患者询问时的表现：比较研究

背景/目的：近年来，人工智能（AI）聊天机器人作为患者在做出医疗/牙科决定时在线医疗保健信息的潜在来源变得越来越普遍。本研究评估了三个人工智能聊天机器人对牙科外伤（TDIs）相关问题所做回答的可读性、质量和准确性，这些回答都是从流行的问答网站上检索的，或者是根据假设的案例情景手动创建的：ChatGPT 3.5、ChatGPT 4.0 和谷歌双子座共收到 59 个创伤性牙伤查询。使用弗莱什阅读容易度（FRE）和弗莱什-金凯德等级（FKGL）分数评估可读性。为了评估回复质量和准确性，使用了 DISCERN 工具、全球质量评分 (GQS) 和错误信息评分。回答的可理解性和可操作性采用患者教育印刷材料评估工具（PEMAT-P）进行分析。统计分析包括对非正态变量进行 Kruskal-Wallis 和 Dunn 的事后检验，对正态变量进行单向方差分析和 Tukey 的事后检验（P 结果：ChatGPT 3.5、ChatGPT 4.0 和 Google Gemini 的平均 FKGL 和 FRE 分数分别为 11.2 分和 49.25 分、11.8 分和 46.42 分以及 10.1 分和 51.91 分，这表明这些回复很难阅读，需要大学水平的阅读能力。在所有聊天机器人中，ChatGPT 3.5 的 DISCERN 和 PEMAT-P 可理解性得分最低（P 结论）：在本研究中，ChatGPT 3.5 虽然被广泛使用，但对有关 TDI 的问题提供了一些误导性和不准确的回答。相比之下，ChatGPT 4.0 和 Google Gemini 生成的答案更准确、更全面，因此作为辅助信息源更可靠。不过，对于像 TDI 这样的复杂问题，任何聊天机器人都无法取代牙医进行诊断、治疗和后续护理。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Dental Traumatology 医学-牙科与口腔外科

CiteScore

6.40

自引率

32.00%

发文量

审稿时长

6-12 weeks

期刊介绍： Dental Traumatology is an international journal that aims to convey scientific and clinical progress in all areas related to adult and pediatric dental traumatology. This includes the following topics: - Epidemiology, Social Aspects, Education, Diagnostics - Esthetics / Prosthetics/ Restorative - Evidence Based Traumatology & Study Design - Oral & Maxillofacial Surgery/Transplant/Implant - Pediatrics and Orthodontics - Prevention and Sports Dentistry - Endodontics and Periodontal Aspects The journal"s aim is to promote communication among clinicians, educators, researchers, and others interested in the field of dental traumatology.