{"title":"Evaluating the Accuracy, Reliability, Consistency, and Readability of Different Large Language Models in Restorative Dentistry.","authors":"Zeyneb Merve Ozdemir, Emre Yapici","doi":"10.1111/jerd.13447","DOIUrl":null,"url":null,"abstract":"<p><strong>Objective: </strong>This study aimed to evaluate the reliability, consistency, and readability of responses provided by various artificial intelligence (AI) programs to questions related to Restorative Dentistry.</p><p><strong>Materials and methods: </strong>Forty-five knowledge-based information and 20 questions (10 patient-related and 10 dentistry-specific) were posed to ChatGPT-3.5, ChatGPT-4, ChatGPT-4o, Chatsonic, Copilot, and Gemini Advanced chatbots. The DISCERN questionnaire was used to assess the reliability; Flesch Reading Ease and Flesch-Kincaid Grade Level scores were utilized to evaluate readability. Accuracy and consistency were determined based on the chatbots' responses to the knowledge-based questions.</p><p><strong>Results: </strong>ChatGPT-4, ChatGPT-4o, Chatsonic, and Copilot demonstrated \"good\" reliability, while ChatGPT-3.5 and Gemini Advanced showed \"fair\" reliability. Chatsonic exhibited the highest \"DISCERN total score\" for patient-related questions, while ChatGPT-4o performed best for dentistry-specific questions. No significant differences were found in readability among the chatbots (p > 0.05). ChatGPT-4o showed the highest accuracy (93.3%) for knowledge-based questions, while Copilot had the lowest (68.9%). ChatGPT-4 demonstrated the highest consistency between repetitions.</p><p><strong>Conclusion: </strong>Performance of AIs varied in terms of accuracy, reliability, consistency, and readability when responding to Restorative Dentistry questions. ChatGPT-4o and Chatsonic showed promising results for academic and patient education applications. However, the readability of responses was generally above recommended levels for patient education materials.</p><p><strong>Clinical significance: </strong>The utilization of AI has an increasing impact on various aspects of dentistry. Moreover, if the responses to patient-related and dentistry-specific questions in restorative dentistry prove to be reliable and comprehensible, this may yield promising outcomes for the future.</p>","PeriodicalId":15988,"journal":{"name":"Journal of Esthetic and Restorative Dentistry","volume":" ","pages":""},"PeriodicalIF":3.2000,"publicationDate":"2025-03-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Esthetic and Restorative Dentistry","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1111/jerd.13447","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"DENTISTRY, ORAL SURGERY & MEDICINE","Score":null,"Total":0}
引用次数: 0
Abstract
Objective: This study aimed to evaluate the reliability, consistency, and readability of responses provided by various artificial intelligence (AI) programs to questions related to Restorative Dentistry.
Materials and methods: Forty-five knowledge-based information and 20 questions (10 patient-related and 10 dentistry-specific) were posed to ChatGPT-3.5, ChatGPT-4, ChatGPT-4o, Chatsonic, Copilot, and Gemini Advanced chatbots. The DISCERN questionnaire was used to assess the reliability; Flesch Reading Ease and Flesch-Kincaid Grade Level scores were utilized to evaluate readability. Accuracy and consistency were determined based on the chatbots' responses to the knowledge-based questions.
Results: ChatGPT-4, ChatGPT-4o, Chatsonic, and Copilot demonstrated "good" reliability, while ChatGPT-3.5 and Gemini Advanced showed "fair" reliability. Chatsonic exhibited the highest "DISCERN total score" for patient-related questions, while ChatGPT-4o performed best for dentistry-specific questions. No significant differences were found in readability among the chatbots (p > 0.05). ChatGPT-4o showed the highest accuracy (93.3%) for knowledge-based questions, while Copilot had the lowest (68.9%). ChatGPT-4 demonstrated the highest consistency between repetitions.
Conclusion: Performance of AIs varied in terms of accuracy, reliability, consistency, and readability when responding to Restorative Dentistry questions. ChatGPT-4o and Chatsonic showed promising results for academic and patient education applications. However, the readability of responses was generally above recommended levels for patient education materials.
Clinical significance: The utilization of AI has an increasing impact on various aspects of dentistry. Moreover, if the responses to patient-related and dentistry-specific questions in restorative dentistry prove to be reliable and comprehensible, this may yield promising outcomes for the future.
期刊介绍:
The Journal of Esthetic and Restorative Dentistry (JERD) is the longest standing peer-reviewed journal devoted solely to advancing the knowledge and practice of esthetic dentistry. Its goal is to provide the very latest evidence-based information in the realm of contemporary interdisciplinary esthetic dentistry through high quality clinical papers, sound research reports and educational features.
The range of topics covered in the journal includes:
- Interdisciplinary esthetic concepts
- Implants
- Conservative adhesive restorations
- Tooth Whitening
- Prosthodontic materials and techniques
- Dental materials
- Orthodontic, periodontal and endodontic esthetics
- Esthetics related research
- Innovations in esthetics