A Comparative Analysis of Responses of Artificial Intelligence Chatbots in Special Needs Dentistry.

Pediatric dentistry Pub Date : 2024-09-15
Rata Rokhshad, Mouada Fadul, Guihua Zhai, Kimberly Carr, Janice G Jackson, Ping Zhang
{"title":"A Comparative Analysis of Responses of Artificial Intelligence Chatbots in Special Needs Dentistry.","authors":"Rata Rokhshad, Mouada Fadul, Guihua Zhai, Kimberly Carr, Janice G Jackson, Ping Zhang","doi":"","DOIUrl":null,"url":null,"abstract":"<p><p><b>Purpose:</b> To evaluate the accuracy and consistency of chatbots in answering questions related to special needs dentistry. <b>Methods:</b> Nine publicly accessible chatbots, including Google Bard, ChatGPT 4, ChatGPT 3.5, Llama, Sage, Claude 2 100k, Claude-instant, Claude-instant-100k, and Google PaLM, were evaluated on their ability to answer a set of 25 true/false questions related to special needs dentistry and 15 questions for syndrome diagnosis based on their oral manifestations. Each chatbot was asked independently three times at a three-week interval from November to December 2023, and the responses were evaluated by dental professionals. The Wilcoxon exact test was used to compare accuracy rates among the chatbots while Cronbach's alpha was utilized to measure the consistency of the chatbots' responses. <b>Results:</b> Chatbots had an average accuracy of 55??4 percent in answering all questions, 37±6 percent in diagnosis, and 67±8 percent in answering true/false questions. No significant difference (P>0.05) in the accuracy proportion was detected between any pairwise chatbot comparison. All chatbots demonstrated acceptable reliability (Cronbach's alpha greater than 0.7), with Claude instant having the highest reliability of 0.93. <b>Conclusion:</b> Chatbots exhibit acceptable consistency in responding to questions related to special needs dentistry and better accuracy in responding to true/false questions than diagnostic questions. The clinical relevance is not fully established at this stage, but it may become a useful tool in the future.</p>","PeriodicalId":101357,"journal":{"name":"Pediatric dentistry","volume":"46 5","pages":"337-344"},"PeriodicalIF":0.0000,"publicationDate":"2024-09-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Pediatric dentistry","FirstCategoryId":"1085","ListUrlMain":"","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Purpose: To evaluate the accuracy and consistency of chatbots in answering questions related to special needs dentistry. Methods: Nine publicly accessible chatbots, including Google Bard, ChatGPT 4, ChatGPT 3.5, Llama, Sage, Claude 2 100k, Claude-instant, Claude-instant-100k, and Google PaLM, were evaluated on their ability to answer a set of 25 true/false questions related to special needs dentistry and 15 questions for syndrome diagnosis based on their oral manifestations. Each chatbot was asked independently three times at a three-week interval from November to December 2023, and the responses were evaluated by dental professionals. The Wilcoxon exact test was used to compare accuracy rates among the chatbots while Cronbach's alpha was utilized to measure the consistency of the chatbots' responses. Results: Chatbots had an average accuracy of 55??4 percent in answering all questions, 37±6 percent in diagnosis, and 67±8 percent in answering true/false questions. No significant difference (P>0.05) in the accuracy proportion was detected between any pairwise chatbot comparison. All chatbots demonstrated acceptable reliability (Cronbach's alpha greater than 0.7), with Claude instant having the highest reliability of 0.93. Conclusion: Chatbots exhibit acceptable consistency in responding to questions related to special needs dentistry and better accuracy in responding to true/false questions than diagnostic questions. The clinical relevance is not fully established at this stage, but it may become a useful tool in the future.

分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
人工智能聊天机器人在特殊需求牙科中的反应对比分析。
目的: 评估聊天机器人在回答特殊需求牙科相关问题时的准确性和一致性。方法对谷歌 Bard、ChatGPT 4、ChatGPT 3.5、Llama、Sage、Claude 2 100k、Claude-instant、Claude-instant-100k 和谷歌 PaLM 等九个可公开访问的聊天机器人回答一组 25 个与特殊需求牙科相关的真/假问题和 15 个基于口腔表现的综合征诊断问题的能力进行了评估。从 2023 年 11 月到 12 月,每个聊天机器人都被独立询问了三次,每次间隔三周,回答情况由牙科专业人员进行评估。Wilcoxon 精确检验用于比较聊天机器人的准确率,而 Cronbach's alpha 则用于测量聊天机器人回答的一致性。结果聊天机器人回答所有问题的平均准确率为 55%,诊断准确率为 37±6%,回答真/假问题的准确率为 67±8%。任何一对聊天机器人之间的准确率对比均未发现明显差异(P>0.05)。所有聊天机器人都表现出可接受的可靠性(Cronbach's alpha 大于 0.7),其中克劳德即时聊天机器人的可靠性最高,达到 0.93。结论聊天机器人在回答与特殊需求牙科相关的问题时表现出了可接受的一致性,在回答真/假问题时比回答诊断性问题更准确。现阶段还不能完全确定其临床相关性,但将来可能会成为一种有用的工具。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
A Comparative Analysis of Responses of Artificial Intelligence Chatbots in Special Needs Dentistry. Adopting The D3 Group's Translational Paradigm for Molar Hypomineralization and Chalky Teeth. Assessment of Oral Hygiene and Gingivitis in Adolescents With and Without Cystic Fibrosis. Association of Parental Divorce With Oral Health in U.S. Children and Adolescents. Comparison of Compressive Strength Between Traditional and Novel Open-Faced Stainless Steel Crowns Filled With Different Restorative Materials.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1