{"title":"Evaluation of Information Provided by ChatGPT Versions on Traumatic Dental Injuries for Dental Students and Professionals.","authors":"Zeynep Öztürk, Cenkhan Bal, Beyza Nur Çelikkaya","doi":"10.1111/edt.13042","DOIUrl":null,"url":null,"abstract":"<p><strong>Background/aim: </strong>The use of AI-driven chatbots for accessing medical information is increasingly popular among educators and students. This study aims to assess two different ChatGPT models-ChatGPT 3.5 and ChatGPT 4.0-regarding their responses to queries about traumatic dental injuries, specifically for dental students and professionals.</p><p><strong>Material and methods: </strong>A total of 40 questions were prepared, divided equally between those concerning definitions and diagnosis and those on treatment and follow-up. The responses from both ChatGPT versions were evaluated on several criteria: quality, reliability, similarity, and readability. These evaluations were conducted using the Global Quality Scale (GQS), the Reliability Scoring System (adapted DISCERN), the Flesch Reading Ease Score (FRES), the Flesch-Kincaid Reading Grade Level (FKRGL), and the Similarity Index. Normality was checked with the Shapiro-Wilk test, and variance homogeneity was assessed using the Levene test.</p><p><strong>Results: </strong>The analysis revealed that ChatGPT 3.5 provided more original responses compared to ChatGPT 4.0. According to FRES scores, both versions were challenging to read, with ChatGPT 3.5 having a higher FRES score (39.732 ± 9.713) than ChatGPT 4.0 (34.813 ± 9.356), indicating relatively better readability. There were no significant differences between the ChatGPT versions regarding GQS, DISCERN, and FKRGL scores. However, in the definition and diagnosis section, ChatGPT 4.0 had a statistically higher quality score than ChatGPT 3.5. In contrast, ChatGPT 3.5 provided more original answers in the treatment and follow-up section. For ChatGPT 4.0, the readability and similarity rates for the definition and diagnosis section were higher than those for the treatment and follow-up section. No significant differences were observed between ChatGPT 3.5's DISCERN, FRES, FKRGL, and similarity index measurements by topic.</p><p><strong>Conclusions: </strong>Both ChatGPT versions offer high-quality and original information, though they present challenges in readability and reliability. They are valuable resources for dental students and professionals but should be used in conjunction with additional sources of information for a comprehensive understanding.</p>","PeriodicalId":55180,"journal":{"name":"Dental Traumatology","volume":" ","pages":""},"PeriodicalIF":2.3000,"publicationDate":"2025-01-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Dental Traumatology","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1111/edt.13042","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"DENTISTRY, ORAL SURGERY & MEDICINE","Score":null,"Total":0}
引用次数: 0
Abstract
Background/aim: The use of AI-driven chatbots for accessing medical information is increasingly popular among educators and students. This study aims to assess two different ChatGPT models-ChatGPT 3.5 and ChatGPT 4.0-regarding their responses to queries about traumatic dental injuries, specifically for dental students and professionals.
Material and methods: A total of 40 questions were prepared, divided equally between those concerning definitions and diagnosis and those on treatment and follow-up. The responses from both ChatGPT versions were evaluated on several criteria: quality, reliability, similarity, and readability. These evaluations were conducted using the Global Quality Scale (GQS), the Reliability Scoring System (adapted DISCERN), the Flesch Reading Ease Score (FRES), the Flesch-Kincaid Reading Grade Level (FKRGL), and the Similarity Index. Normality was checked with the Shapiro-Wilk test, and variance homogeneity was assessed using the Levene test.
Results: The analysis revealed that ChatGPT 3.5 provided more original responses compared to ChatGPT 4.0. According to FRES scores, both versions were challenging to read, with ChatGPT 3.5 having a higher FRES score (39.732 ± 9.713) than ChatGPT 4.0 (34.813 ± 9.356), indicating relatively better readability. There were no significant differences between the ChatGPT versions regarding GQS, DISCERN, and FKRGL scores. However, in the definition and diagnosis section, ChatGPT 4.0 had a statistically higher quality score than ChatGPT 3.5. In contrast, ChatGPT 3.5 provided more original answers in the treatment and follow-up section. For ChatGPT 4.0, the readability and similarity rates for the definition and diagnosis section were higher than those for the treatment and follow-up section. No significant differences were observed between ChatGPT 3.5's DISCERN, FRES, FKRGL, and similarity index measurements by topic.
Conclusions: Both ChatGPT versions offer high-quality and original information, though they present challenges in readability and reliability. They are valuable resources for dental students and professionals but should be used in conjunction with additional sources of information for a comprehensive understanding.
期刊介绍:
Dental Traumatology is an international journal that aims to convey scientific and clinical progress in all areas related to adult and pediatric dental traumatology. This includes the following topics:
- Epidemiology, Social Aspects, Education, Diagnostics
- Esthetics / Prosthetics/ Restorative
- Evidence Based Traumatology & Study Design
- Oral & Maxillofacial Surgery/Transplant/Implant
- Pediatrics and Orthodontics
- Prevention and Sports Dentistry
- Endodontics and Periodontal Aspects
The journal"s aim is to promote communication among clinicians, educators, researchers, and others interested in the field of dental traumatology.