Evaluating large language models in analysing classroom dialogue.

IF 3 1区心理学 Q1 EDUCATION & EDUCATIONAL RESEARCH npj Science of Learning Pub Date : 2024-10-03 DOI:10.1038/s41539-024-00273-3

Yun Long, Haifeng Luo, Yu Zhang

引用次数: 0

Abstract

This study explores the use of Large Language Models (LLMs), specifically GPT-4, in analysing classroom dialogue-a key task for teaching diagnosis and quality improvement. Traditional qualitative methods are both knowledge- and labour-intensive. This research investigates the potential of LLMs to streamline and enhance this process. Using datasets from middle school mathematics and Chinese classes, classroom dialogues were manually coded by experts and then analysed with a customised GPT-4 model. The study compares manual annotations with GPT-4 outputs to evaluate efficacy. Metrics include time efficiency, inter-coder agreement, and reliability between human coders and GPT-4. Results show significant time savings and high coding consistency between the model and human coders, with minor discrepancies. These findings highlight the strong potential of LLMs in teaching evaluation and facilitation.

Abstract Image

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

评估分析课堂对话的大型语言模型。

本研究探讨了大型语言模型（LLM），特别是 GPT-4 在分析课堂对话中的应用--这是教学诊断和质量改进的关键任务。传统的定性方法既是知识密集型的，也是劳动密集型的。本研究探讨了语言模型在简化和增强这一过程中的潜力。利用中学数学和中文课堂的数据集，由专家对课堂对话进行人工编码，然后使用定制的 GPT-4 模型进行分析。研究比较了人工注释和 GPT-4 输出，以评估其功效。衡量标准包括时间效率、编码员之间的一致性以及人工编码员与 GPT-4 之间的可靠性。结果表明，模型与人工编码人员之间的时间节省效果明显，编码一致性高，差异较小。这些发现凸显了 LLM 在教学评价和促进方面的巨大潜力。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

npj Science of Learning Multiple-

CiteScore

5.40

自引率

7.10%

发文量