评估 ChatGPT 和 GPT-4o 在课堂话语数据编码方面的性能：同步在线数学教学研究

Q1 Social Sciences Computers and Education Artificial Intelligence Pub Date : 2024-10-28 DOI:10.1016/j.caeai.2024.100325

Simin Xu , Xiaowei Huang , Chung Kwan Lo , Gaowei Chen , Morris Siu-yung Jong

{"title":"评估 ChatGPT 和 GPT-4o 在课堂话语数据编码方面的性能：同步在线数学教学研究","authors":"Simin Xu , Xiaowei Huang , Chung Kwan Lo , Gaowei Chen , Morris Siu-yung Jong","doi":"10.1016/j.caeai.2024.100325","DOIUrl":null,"url":null,"abstract":"<div><div>High-quality instruction is essential to facilitating student learning, prompting many professional development (PD) programmes for teachers to focus on improving classroom dialogue. However, during PD programmes, analysing discourse data is time-consuming, delaying feedback on teachers' performance and potentially impairing the programmes' effectiveness. We therefore explored the use of ChatGPT (a fine-tuned GPT-3.5 series model) and GPT-4o to automate the coding of classroom discourse data. We equipped these AI tools with a codebook designed for mathematics discourse and academically productive talk. Our dataset consisted of over 400 authentic talk turns in Chinese from synchronous online mathematics lessons. The coding outcomes of ChatGPT and GPT-4o were quantitatively compared against a human standard. Qualitative analysis was conducted to understand their coding decisions. The overall agreement between the human standard, ChatGPT output, and GPT-4o output was moderate (Fleiss's Kappa = 0.46) when classifying talk turns into major categories. Pairwise comparisons indicated that GPT-4o (Cohen's Kappa = 0.69) had better performance than ChatGPT (Cohen's Kappa = 0.33). However, at the code level, the performance of both AI tools was unsatisfactory. Based on the identified competences and weaknesses, we propose a two-stage approach to classroom discourse analysis. Specifically, GPT-4o can be employed for the initial category-level analysis, following which teacher educators can conduct a more detailed code-level analysis and refine the coding outcomes. This approach can facilitate timely provision of analytical resources for teachers to reflect on their teaching practices.</div></div>","PeriodicalId":34469,"journal":{"name":"Computers and Education Artificial Intelligence","volume":"7 ","pages":"Article 100325"},"PeriodicalIF":0.0000,"publicationDate":"2024-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Evaluating the performance of ChatGPT and GPT-4o in coding classroom discourse data: A study of synchronous online mathematics instruction\",\"authors\":\"Simin Xu , Xiaowei Huang , Chung Kwan Lo , Gaowei Chen , Morris Siu-yung Jong\",\"doi\":\"10.1016/j.caeai.2024.100325\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>High-quality instruction is essential to facilitating student learning, prompting many professional development (PD) programmes for teachers to focus on improving classroom dialogue. However, during PD programmes, analysing discourse data is time-consuming, delaying feedback on teachers' performance and potentially impairing the programmes' effectiveness. We therefore explored the use of ChatGPT (a fine-tuned GPT-3.5 series model) and GPT-4o to automate the coding of classroom discourse data. We equipped these AI tools with a codebook designed for mathematics discourse and academically productive talk. Our dataset consisted of over 400 authentic talk turns in Chinese from synchronous online mathematics lessons. The coding outcomes of ChatGPT and GPT-4o were quantitatively compared against a human standard. Qualitative analysis was conducted to understand their coding decisions. The overall agreement between the human standard, ChatGPT output, and GPT-4o output was moderate (Fleiss's Kappa = 0.46) when classifying talk turns into major categories. Pairwise comparisons indicated that GPT-4o (Cohen's Kappa = 0.69) had better performance than ChatGPT (Cohen's Kappa = 0.33). However, at the code level, the performance of both AI tools was unsatisfactory. Based on the identified competences and weaknesses, we propose a two-stage approach to classroom discourse analysis. Specifically, GPT-4o can be employed for the initial category-level analysis, following which teacher educators can conduct a more detailed code-level analysis and refine the coding outcomes. This approach can facilitate timely provision of analytical resources for teachers to reflect on their teaching practices.</div></div>\",\"PeriodicalId\":34469,\"journal\":{\"name\":\"Computers and Education Artificial Intelligence\",\"volume\":\"7 \",\"pages\":\"Article 100325\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-10-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Computers and Education Artificial Intelligence\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2666920X24001280\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"Social Sciences\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computers and Education Artificial Intelligence","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2666920X24001280","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"Social Sciences","Score":null,"Total":0}

引用次数: 0

摘要

高质量的教学对促进学生的学习至关重要，这促使许多教师专业发展（PD）计划将重点放在改善课堂对话上。然而，在教师专业发展（PD）项目中，分析对话数据非常耗时，会延迟对教师表现的反馈，并可能影响项目的有效性。因此，我们探索使用 ChatGPT（经过微调的 GPT-3.5 系列模型）和 GPT-4o 对课堂对话数据进行自动编码。我们为这些人工智能工具配备了专为数学话语和学术性对话设计的编码手册。我们的数据集包括来自同步在线数学课的 400 多个真实的中文对话。我们将 ChatGPT 和 GPT-4o 的编码结果与人类标准进行了定量比较。为了理解它们的编码决定，还进行了定性分析。在将说话转折分为主要类别时，人类标准、ChatGPT 输出和 GPT-4o 输出之间的总体一致性为中等（弗莱斯卡帕 = 0.46）。配对比较表明，GPT-4o（科恩卡帕 = 0.69）的表现优于 ChatGPT（科恩卡帕 = 0.33）。然而，在代码层面，两种人工智能工具的表现都不尽如人意。根据所发现的能力和弱点，我们提出了一种两阶段的课堂话语分析方法。具体来说，可以使用 GPT-4o 进行初步的类别分析，然后教师教育者可以进行更详细的代码级分析，并完善编码结果。这种方法有助于及时为教师提供分析资源，帮助他们反思自己的教学实践。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Evaluating the performance of ChatGPT and GPT-4o in coding classroom discourse data: A study of synchronous online mathematics instruction

High-quality instruction is essential to facilitating student learning, prompting many professional development (PD) programmes for teachers to focus on improving classroom dialogue. However, during PD programmes, analysing discourse data is time-consuming, delaying feedback on teachers' performance and potentially impairing the programmes' effectiveness. We therefore explored the use of ChatGPT (a fine-tuned GPT-3.5 series model) and GPT-4o to automate the coding of classroom discourse data. We equipped these AI tools with a codebook designed for mathematics discourse and academically productive talk. Our dataset consisted of over 400 authentic talk turns in Chinese from synchronous online mathematics lessons. The coding outcomes of ChatGPT and GPT-4o were quantitatively compared against a human standard. Qualitative analysis was conducted to understand their coding decisions. The overall agreement between the human standard, ChatGPT output, and GPT-4o output was moderate (Fleiss's Kappa = 0.46) when classifying talk turns into major categories. Pairwise comparisons indicated that GPT-4o (Cohen's Kappa = 0.69) had better performance than ChatGPT (Cohen's Kappa = 0.33). However, at the code level, the performance of both AI tools was unsatisfactory. Based on the identified competences and weaknesses, we propose a two-stage approach to classroom discourse analysis. Specifically, GPT-4o can be employed for the initial category-level analysis, following which teacher educators can conduct a more detailed code-level analysis and refine the coding outcomes. This approach can facilitate timely provision of analytical resources for teachers to reflect on their teaching practices.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊