Rata Rokhshad, Mouada Fadul, Guihua Zhai, Kimberly Carr, Janice G Jackson, Ping Zhang
{"title":"A Comparative Analysis of Responses of Artificial Intelligence Chatbots in Special Needs Dentistry.","authors":"Rata Rokhshad, Mouada Fadul, Guihua Zhai, Kimberly Carr, Janice G Jackson, Ping Zhang","doi":"","DOIUrl":null,"url":null,"abstract":"<p><p><b>Purpose:</b> To evaluate the accuracy and consistency of chatbots in answering questions related to special needs dentistry. <b>Methods:</b> Nine publicly accessible chatbots, including Google Bard, ChatGPT 4, ChatGPT 3.5, Llama, Sage, Claude 2 100k, Claude-instant, Claude-instant-100k, and Google PaLM, were evaluated on their ability to answer a set of 25 true/false questions related to special needs dentistry and 15 questions for syndrome diagnosis based on their oral manifestations. Each chatbot was asked independently three times at a three-week interval from November to December 2023, and the responses were evaluated by dental professionals. The Wilcoxon exact test was used to compare accuracy rates among the chatbots while Cronbach's alpha was utilized to measure the consistency of the chatbots' responses. <b>Results:</b> Chatbots had an average accuracy of 55??4 percent in answering all questions, 37±6 percent in diagnosis, and 67±8 percent in answering true/false questions. No significant difference (P>0.05) in the accuracy proportion was detected between any pairwise chatbot comparison. All chatbots demonstrated acceptable reliability (Cronbach's alpha greater than 0.7), with Claude instant having the highest reliability of 0.93. <b>Conclusion:</b> Chatbots exhibit acceptable consistency in responding to questions related to special needs dentistry and better accuracy in responding to true/false questions than diagnostic questions. The clinical relevance is not fully established at this stage, but it may become a useful tool in the future.</p>","PeriodicalId":101357,"journal":{"name":"Pediatric dentistry","volume":"46 5","pages":"337-344"},"PeriodicalIF":0.0000,"publicationDate":"2024-09-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Pediatric dentistry","FirstCategoryId":"1085","ListUrlMain":"","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Purpose: To evaluate the accuracy and consistency of chatbots in answering questions related to special needs dentistry. Methods: Nine publicly accessible chatbots, including Google Bard, ChatGPT 4, ChatGPT 3.5, Llama, Sage, Claude 2 100k, Claude-instant, Claude-instant-100k, and Google PaLM, were evaluated on their ability to answer a set of 25 true/false questions related to special needs dentistry and 15 questions for syndrome diagnosis based on their oral manifestations. Each chatbot was asked independently three times at a three-week interval from November to December 2023, and the responses were evaluated by dental professionals. The Wilcoxon exact test was used to compare accuracy rates among the chatbots while Cronbach's alpha was utilized to measure the consistency of the chatbots' responses. Results: Chatbots had an average accuracy of 55??4 percent in answering all questions, 37±6 percent in diagnosis, and 67±8 percent in answering true/false questions. No significant difference (P>0.05) in the accuracy proportion was detected between any pairwise chatbot comparison. All chatbots demonstrated acceptable reliability (Cronbach's alpha greater than 0.7), with Claude instant having the highest reliability of 0.93. Conclusion: Chatbots exhibit acceptable consistency in responding to questions related to special needs dentistry and better accuracy in responding to true/false questions than diagnostic questions. The clinical relevance is not fully established at this stage, but it may become a useful tool in the future.