Evaluating the performance of ChatGPT and GPT-4o in coding classroom discourse data: A study of synchronous online mathematics instruction

Simin Xu , Xiaowei Huang , Chung Kwan Lo , Gaowei Chen , Morris Siu-yung Jong
{"title":"Evaluating the performance of ChatGPT and GPT-4o in coding classroom discourse data: A study of synchronous online mathematics instruction","authors":"Simin Xu ,&nbsp;Xiaowei Huang ,&nbsp;Chung Kwan Lo ,&nbsp;Gaowei Chen ,&nbsp;Morris Siu-yung Jong","doi":"10.1016/j.caeai.2024.100325","DOIUrl":null,"url":null,"abstract":"<div><div>High-quality instruction is essential to facilitating student learning, prompting many professional development (PD) programmes for teachers to focus on improving classroom dialogue. However, during PD programmes, analysing discourse data is time-consuming, delaying feedback on teachers' performance and potentially impairing the programmes' effectiveness. We therefore explored the use of ChatGPT (a fine-tuned GPT-3.5 series model) and GPT-4o to automate the coding of classroom discourse data. We equipped these AI tools with a codebook designed for mathematics discourse and academically productive talk. Our dataset consisted of over 400 authentic talk turns in Chinese from synchronous online mathematics lessons. The coding outcomes of ChatGPT and GPT-4o were quantitatively compared against a human standard. Qualitative analysis was conducted to understand their coding decisions. The overall agreement between the human standard, ChatGPT output, and GPT-4o output was moderate (Fleiss's Kappa = 0.46) when classifying talk turns into major categories. Pairwise comparisons indicated that GPT-4o (Cohen's Kappa = 0.69) had better performance than ChatGPT (Cohen's Kappa = 0.33). However, at the code level, the performance of both AI tools was unsatisfactory. Based on the identified competences and weaknesses, we propose a two-stage approach to classroom discourse analysis. Specifically, GPT-4o can be employed for the initial category-level analysis, following which teacher educators can conduct a more detailed code-level analysis and refine the coding outcomes. This approach can facilitate timely provision of analytical resources for teachers to reflect on their teaching practices.</div></div>","PeriodicalId":34469,"journal":{"name":"Computers and Education Artificial Intelligence","volume":"7 ","pages":"Article 100325"},"PeriodicalIF":0.0000,"publicationDate":"2024-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computers and Education Artificial Intelligence","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2666920X24001280","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"Social Sciences","Score":null,"Total":0}
引用次数: 0

Abstract

High-quality instruction is essential to facilitating student learning, prompting many professional development (PD) programmes for teachers to focus on improving classroom dialogue. However, during PD programmes, analysing discourse data is time-consuming, delaying feedback on teachers' performance and potentially impairing the programmes' effectiveness. We therefore explored the use of ChatGPT (a fine-tuned GPT-3.5 series model) and GPT-4o to automate the coding of classroom discourse data. We equipped these AI tools with a codebook designed for mathematics discourse and academically productive talk. Our dataset consisted of over 400 authentic talk turns in Chinese from synchronous online mathematics lessons. The coding outcomes of ChatGPT and GPT-4o were quantitatively compared against a human standard. Qualitative analysis was conducted to understand their coding decisions. The overall agreement between the human standard, ChatGPT output, and GPT-4o output was moderate (Fleiss's Kappa = 0.46) when classifying talk turns into major categories. Pairwise comparisons indicated that GPT-4o (Cohen's Kappa = 0.69) had better performance than ChatGPT (Cohen's Kappa = 0.33). However, at the code level, the performance of both AI tools was unsatisfactory. Based on the identified competences and weaknesses, we propose a two-stage approach to classroom discourse analysis. Specifically, GPT-4o can be employed for the initial category-level analysis, following which teacher educators can conduct a more detailed code-level analysis and refine the coding outcomes. This approach can facilitate timely provision of analytical resources for teachers to reflect on their teaching practices.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
评估 ChatGPT 和 GPT-4o 在课堂话语数据编码方面的性能:同步在线数学教学研究
高质量的教学对促进学生的学习至关重要,这促使许多教师专业发展(PD)计划将重点放在改善课堂对话上。然而,在教师专业发展(PD)项目中,分析对话数据非常耗时,会延迟对教师表现的反馈,并可能影响项目的有效性。因此,我们探索使用 ChatGPT(经过微调的 GPT-3.5 系列模型)和 GPT-4o 对课堂对话数据进行自动编码。我们为这些人工智能工具配备了专为数学话语和学术性对话设计的编码手册。我们的数据集包括来自同步在线数学课的 400 多个真实的中文对话。我们将 ChatGPT 和 GPT-4o 的编码结果与人类标准进行了定量比较。为了理解它们的编码决定,还进行了定性分析。在将说话转折分为主要类别时,人类标准、ChatGPT 输出和 GPT-4o 输出之间的总体一致性为中等(弗莱斯卡帕 = 0.46)。配对比较表明,GPT-4o(科恩卡帕 = 0.69)的表现优于 ChatGPT(科恩卡帕 = 0.33)。然而,在代码层面,两种人工智能工具的表现都不尽如人意。根据所发现的能力和弱点,我们提出了一种两阶段的课堂话语分析方法。具体来说,可以使用 GPT-4o 进行初步的类别分析,然后教师教育者可以进行更详细的代码级分析,并完善编码结果。这种方法有助于及时为教师提供分析资源,帮助他们反思自己的教学实践。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
CiteScore
16.80
自引率
0.00%
发文量
66
审稿时长
50 days
期刊最新文献
Enhancing data analysis and programming skills through structured prompt training: The impact of generative AI in engineering education Understanding the practices, perceptions, and (dis)trust of generative AI among instructors: A mixed-methods study in the U.S. higher education Technological self-efficacy and sense of coherence: Key drivers in teachers' AI acceptance and adoption The influence of AI literacy on complex problem-solving skills through systematic thinking skills and intuition thinking skills: An empirical study in Thai gen Z accounting students Psychometrics of an Elo-based large-scale online learning system
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1