Ahmed A Abdel Aziz, Hams H Abdelrahman, Mohamed G Hassan
{"title":"The use of ChatGPT and Google Gemini in responding to orthognathic surgery-related questions: A comparative study.","authors":"Ahmed A Abdel Aziz, Hams H Abdelrahman, Mohamed G Hassan","doi":"10.1016/j.ejwf.2024.09.004","DOIUrl":null,"url":null,"abstract":"<p><strong>Aim: </strong>This study employed a quantitative approach to compare the reliability of responses provided by ChatGPT-3.5, ChatGPT-4, and Google Gemini in response to orthognathic surgery-related questions.</p><p><strong>Material and methods: </strong>The authors adapted a set of 64 questions encompassing all of the domains and aspects related to orthognathic surgery. One author submitted the questions to ChatGPT3.5, ChatGPT4, and Google Gemini. The AI-generated responses from the three platforms were recorded and evaluated by 2 blinded and independent experts. The reliability of AI-generated responses was evaluated using a tool for accuracy of information and completeness. In addition, the provision of definitive answers to close-ended questions, references, graphical elements, and advice to schedule consultations with a specialist were collected.</p><p><strong>Results: </strong>Although ChatGPT-3.5 achieved the highest information reliability score, the 3 LLMs showed similar reliability scores in providing responses to orthognathic surgery-related inquiries. Moreover, Google Gemini significantly included physician recommendations and provided graphical elements. Both ChatGPT-3.5 and -4 lacked these features.</p><p><strong>Conclusion: </strong>This study shows that ChatGPT-3.5, ChatGPT-4, and Google Gemini can provide reliable responses to inquires about orthognathic surgery. However, Google Gemini stood out by incorporating additional references and illustrations within its responses. These findings highlight the need for an additional evaluation of AI capabilities across different healthcare domains.</p>","PeriodicalId":43456,"journal":{"name":"Journal of the World Federation of Orthodontists","volume":" ","pages":""},"PeriodicalIF":2.6000,"publicationDate":"2024-10-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of the World Federation of Orthodontists","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1016/j.ejwf.2024.09.004","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"DENTISTRY, ORAL SURGERY & MEDICINE","Score":null,"Total":0}
引用次数: 0
Abstract
Aim: This study employed a quantitative approach to compare the reliability of responses provided by ChatGPT-3.5, ChatGPT-4, and Google Gemini in response to orthognathic surgery-related questions.
Material and methods: The authors adapted a set of 64 questions encompassing all of the domains and aspects related to orthognathic surgery. One author submitted the questions to ChatGPT3.5, ChatGPT4, and Google Gemini. The AI-generated responses from the three platforms were recorded and evaluated by 2 blinded and independent experts. The reliability of AI-generated responses was evaluated using a tool for accuracy of information and completeness. In addition, the provision of definitive answers to close-ended questions, references, graphical elements, and advice to schedule consultations with a specialist were collected.
Results: Although ChatGPT-3.5 achieved the highest information reliability score, the 3 LLMs showed similar reliability scores in providing responses to orthognathic surgery-related inquiries. Moreover, Google Gemini significantly included physician recommendations and provided graphical elements. Both ChatGPT-3.5 and -4 lacked these features.
Conclusion: This study shows that ChatGPT-3.5, ChatGPT-4, and Google Gemini can provide reliable responses to inquires about orthognathic surgery. However, Google Gemini stood out by incorporating additional references and illustrations within its responses. These findings highlight the need for an additional evaluation of AI capabilities across different healthcare domains.