Gautham Arun, Vivek Perumal, Francis Paul John Bato Urias, Yan En Ler, Bryan Wen Tao Tan, Ranganath Vallabhajosyula, Emmanuel Tan, Olivia Ng, Kian Bee Ng, Sreenivasulu Reddy Mogali
{"title":"ChatGPT versus a customized AI chatbot (Anatbuddy) for anatomy education: A comparative pilot study","authors":"Gautham Arun, Vivek Perumal, Francis Paul John Bato Urias, Yan En Ler, Bryan Wen Tao Tan, Ranganath Vallabhajosyula, Emmanuel Tan, Olivia Ng, Kian Bee Ng, Sreenivasulu Reddy Mogali","doi":"10.1002/ase.2502","DOIUrl":null,"url":null,"abstract":"<p>Large Language Models (LLMs) have the potential to improve education by personalizing learning. However, ChatGPT-generated content has been criticized for sometimes producing false, biased, and/or hallucinatory information. To evaluate AI's ability to return clear and accurate anatomy information, this study generated a custom interactive and intelligent chatbot (Anatbuddy) through an Open AI Application Programming Interface (API) that enables seamless AI-driven interactions within a secured cloud infrastructure. Anatbuddy was programmed through a Retrieval Augmented Generation (RAG) method to provide context-aware responses to user queries based on a predetermined knowledge base. To compare their outputs, various queries (i.e., prompts) on thoracic anatomy (<i>n</i> = 18) were fed into Anatbuddy and ChatGPT 3.5. A panel comprising three experienced anatomists evaluated both tools' responses for factual accuracy, relevance, completeness, coherence, and fluency on a 5-point Likert scale. These ratings were reviewed by a third party blinded to the study, who revised and finalized scores as needed. Anatbuddy's factual accuracy (mean ± SD = 4.78/5.00 ± 0.43; median = 5.00) was rated significantly higher (<i>U</i> = 84, <i>p</i> = 0.01) than ChatGPT's accuracy (4.11 ± 0.83; median = 4.00). No statistically significant differences were detected between the chatbots for the other variables. Given ChatGPT's current content knowledge limitations, we strongly recommend the anatomy profession develop a custom AI chatbot for anatomy education utilizing a carefully curated knowledge base to ensure accuracy. Further research is needed to determine students' acceptance of custom chatbots for anatomy education and their influence on learning experiences and outcomes.</p>","PeriodicalId":124,"journal":{"name":"Anatomical Sciences Education","volume":"17 7","pages":"1396-1405"},"PeriodicalIF":5.2000,"publicationDate":"2024-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Anatomical Sciences Education","FirstCategoryId":"95","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/ase.2502","RegionNum":2,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"EDUCATION, SCIENTIFIC DISCIPLINES","Score":null,"Total":0}
引用次数: 0
Abstract
Large Language Models (LLMs) have the potential to improve education by personalizing learning. However, ChatGPT-generated content has been criticized for sometimes producing false, biased, and/or hallucinatory information. To evaluate AI's ability to return clear and accurate anatomy information, this study generated a custom interactive and intelligent chatbot (Anatbuddy) through an Open AI Application Programming Interface (API) that enables seamless AI-driven interactions within a secured cloud infrastructure. Anatbuddy was programmed through a Retrieval Augmented Generation (RAG) method to provide context-aware responses to user queries based on a predetermined knowledge base. To compare their outputs, various queries (i.e., prompts) on thoracic anatomy (n = 18) were fed into Anatbuddy and ChatGPT 3.5. A panel comprising three experienced anatomists evaluated both tools' responses for factual accuracy, relevance, completeness, coherence, and fluency on a 5-point Likert scale. These ratings were reviewed by a third party blinded to the study, who revised and finalized scores as needed. Anatbuddy's factual accuracy (mean ± SD = 4.78/5.00 ± 0.43; median = 5.00) was rated significantly higher (U = 84, p = 0.01) than ChatGPT's accuracy (4.11 ± 0.83; median = 4.00). No statistically significant differences were detected between the chatbots for the other variables. Given ChatGPT's current content knowledge limitations, we strongly recommend the anatomy profession develop a custom AI chatbot for anatomy education utilizing a carefully curated knowledge base to ensure accuracy. Further research is needed to determine students' acceptance of custom chatbots for anatomy education and their influence on learning experiences and outcomes.
期刊介绍:
Anatomical Sciences Education, affiliated with the American Association for Anatomy, serves as an international platform for sharing ideas, innovations, and research related to education in anatomical sciences. Covering gross anatomy, embryology, histology, and neurosciences, the journal addresses education at various levels, including undergraduate, graduate, post-graduate, allied health, medical (both allopathic and osteopathic), and dental. It fosters collaboration and discussion in the field of anatomical sciences education.