Introduction: The use of large language models (LLMs) such as ChatGPT to generate multiple-choice questions (MCQs) for medical and dental education is rapidly increasing. However, the educational validity, cognitive depth, and practical usability of AI-generated questions remain underexplored in dental education. This study aimed to evaluate the performance of ChatGPT-4o in generating MCQs for pre-doctoral oral and maxillofacial radiology curricula.
Methods: ChatGPT-4o was prompted to generate 100 multiple-choice questions based on lecture materials from a pre-doctoral oral and maxillofacial radiology course. A panel of expert oral radiologists independently evaluated the quality of the questions, answers, and explanations. Additionally, a randomised subset of 64 MCQs was assessed by human experts and an AI-detection tool to determine the accuracy of source identification.
Results: Experts rated 43% of AI-generated questions and 65% of their corresponding answers as correct and usable with minor adjustments. Most questions focussed on knowledge recall, with few assessing higher order thinking skills. Both human experts and AI-detection tools struggled to accurately differentiate between AI-generated and human-created questions.
Conclusion: While ChatGPT-4o can generate MCQs, its output often requires refinement. Future research should explore ways to improve the quality and cognitive level of AI-generated questions for dental education.
扫码关注我们
求助内容:
应助结果提醒方式:
