{"title":"Artificial intelligence in dental education: ChatGPT's performance on the periodontic in-service examination","authors":"Arman Danesh, Hirad Pazouki, Farzad Danesh, Arsalan Danesh, Saynur Vardar-Sengul","doi":"10.1002/JPER.23-0514","DOIUrl":null,"url":null,"abstract":"<div>\n \n \n <section>\n \n <h3> Background</h3>\n \n <p>ChatGPT's (Chat Generative Pre-Trained Transformer) remarkable capacity to generate human-like output makes it an appealing learning tool for healthcare students worldwide. Nevertheless, the chatbot's responses may be subject to inaccuracies, putting forth an intense risk of misinformation. ChatGPT's capabilities should be examined in every corner of healthcare education, including dentistry and its specialties, to understand the potential of misinformation associated with the chatbot's use as a learning tool. Our investigation aims to explore ChatGPT's foundation of knowledge in the field of periodontology by evaluating the chatbot's performance on questions obtained from an in-service examination administered by the American Academy of Periodontology (AAP).</p>\n </section>\n \n <section>\n \n <h3> Methods</h3>\n \n <p>ChatGPT3.5 and ChatGPT4 were evaluated on 311 multiple-choice questions obtained from the 2023 in-service examination administered by the AAP. The dataset of in-service examination questions was accessed through Nova Southeastern University's Department of Periodontology. Our study excluded questions containing an image as ChatGPT does not accept image inputs.</p>\n </section>\n \n <section>\n \n <h3> Results</h3>\n \n <p>ChatGPT3.5 and ChatGPT4 answered 57.9% and 73.6% of in-service questions correctly on the 2023 Periodontics In-Service Written Examination, respectively. A two-tailed <i>t</i> test was incorporated to compare independent sample means, and sample proportions were compared using a two-tailed χ<sup>2</sup> test. A <i>p</i> value below the threshold of 0.05 was deemed statistically significant.</p>\n </section>\n \n <section>\n \n <h3> Conclusion</h3>\n \n <p>While ChatGPT4 showed a higher proficiency compared to ChatGPT3.5, both chatbot models leave considerable room for misinformation with their responses relating to periodontology. The findings of the study encourage residents to scrutinize the periodontic information generated by ChatGPT to account for the chatbot's current limitations.</p>\n </section>\n </div>","PeriodicalId":16716,"journal":{"name":"Journal of periodontology","volume":"95 7","pages":"682-687"},"PeriodicalIF":4.2000,"publicationDate":"2024-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of periodontology","FirstCategoryId":"3","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/JPER.23-0514","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"DENTISTRY, ORAL SURGERY & MEDICINE","Score":null,"Total":0}
引用次数: 0
Abstract
Background
ChatGPT's (Chat Generative Pre-Trained Transformer) remarkable capacity to generate human-like output makes it an appealing learning tool for healthcare students worldwide. Nevertheless, the chatbot's responses may be subject to inaccuracies, putting forth an intense risk of misinformation. ChatGPT's capabilities should be examined in every corner of healthcare education, including dentistry and its specialties, to understand the potential of misinformation associated with the chatbot's use as a learning tool. Our investigation aims to explore ChatGPT's foundation of knowledge in the field of periodontology by evaluating the chatbot's performance on questions obtained from an in-service examination administered by the American Academy of Periodontology (AAP).
Methods
ChatGPT3.5 and ChatGPT4 were evaluated on 311 multiple-choice questions obtained from the 2023 in-service examination administered by the AAP. The dataset of in-service examination questions was accessed through Nova Southeastern University's Department of Periodontology. Our study excluded questions containing an image as ChatGPT does not accept image inputs.
Results
ChatGPT3.5 and ChatGPT4 answered 57.9% and 73.6% of in-service questions correctly on the 2023 Periodontics In-Service Written Examination, respectively. A two-tailed t test was incorporated to compare independent sample means, and sample proportions were compared using a two-tailed χ2 test. A p value below the threshold of 0.05 was deemed statistically significant.
Conclusion
While ChatGPT4 showed a higher proficiency compared to ChatGPT3.5, both chatbot models leave considerable room for misinformation with their responses relating to periodontology. The findings of the study encourage residents to scrutinize the periodontic information generated by ChatGPT to account for the chatbot's current limitations.