Since the public release of ChatGPT in late 2022, the role of Generative AI chatbots in education has been widely debated. While some see their potential as automated tutors, others worry that inaccuracies and hallucinations could harm student learning. This study assesses ChatGPT models in terms of important dimensions by evaluating their capabilities and limitations in serving as a non-interactive, automated tutor. For this, we use a comparative benchmark design in which these models complete the same tasks under predefined success criteria. We compare three ChatGPT models (GPT-3.5, GPT-4o, and o1preview) in tasks comprising the explanation of 56 economic concepts and answering 25 multiple-choice questions. We evaluate the responses using a marking grid. Our findings indicate that newer models generate very accurate responses, although some inaccuracies persist. A key concern is that ChatGPT presents all responses with complete confidence, making errors difficult for students to recognize. Furthermore, explanations are often quite narrow, lacking holistic perspectives, and the quality of examples remains poor. Despite these limitations, we argue that ChatGPT can serve as an effective automated tutor for basic, knowledge-based questions, supporting students while posing a manageable risk of misinformation. However, educators should teach students about the effective use and limitations of the technology.
扫码关注我们
求助内容:
应助结果提醒方式:
