Who's the Best Detective? Large Language Models vs. Traditional Machine Learning in Detecting Incoherent Fourth Grade Math Answers

IF 4 2区 教育学 Q1 EDUCATION & EDUCATIONAL RESEARCH Journal of Educational Computing Research Pub Date : 2023-11-10 DOI:10.1177/07356331231191174
Felipe Urrutia, Roberto Araya
{"title":"Who's the Best Detective? Large Language Models vs. Traditional Machine Learning in Detecting Incoherent Fourth Grade Math Answers","authors":"Felipe Urrutia, Roberto Araya","doi":"10.1177/07356331231191174","DOIUrl":null,"url":null,"abstract":"Written answers to open-ended questions can have a higher long-term effect on learning than multiple-choice questions. However, it is critical that teachers immediately review the answers, and ask to redo those that are incoherent. This can be a difficult task and can be time-consuming for teachers. A possible solution is to automate the detection of incoherent answers. One option is to automate the review with Large Language Models (LLM). They have a powerful discursive ability that can be used to explain decisions. In this paper, we analyze the responses of fourth graders in mathematics using three LLMs: GPT-3, BLOOM, and YOU. We used them with zero, one, two, three and four shots. We compared their performance with the results of various classifiers trained with Machine Learning (ML). We found that LLMs perform worse than MLs in detecting incoherent answers. The difficulty seems to reside in recursive questions that contain both questions and answers, and in responses from students with typical fourth-grader misspellings. Upon closer examination, we have found that the ChatGPT model faces the same challenges.","PeriodicalId":47865,"journal":{"name":"Journal of Educational Computing Research","volume":"114 19","pages":"0"},"PeriodicalIF":4.0000,"publicationDate":"2023-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Educational Computing Research","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1177/07356331231191174","RegionNum":2,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"EDUCATION & EDUCATIONAL RESEARCH","Score":null,"Total":0}
引用次数: 0

Abstract

Written answers to open-ended questions can have a higher long-term effect on learning than multiple-choice questions. However, it is critical that teachers immediately review the answers, and ask to redo those that are incoherent. This can be a difficult task and can be time-consuming for teachers. A possible solution is to automate the detection of incoherent answers. One option is to automate the review with Large Language Models (LLM). They have a powerful discursive ability that can be used to explain decisions. In this paper, we analyze the responses of fourth graders in mathematics using three LLMs: GPT-3, BLOOM, and YOU. We used them with zero, one, two, three and four shots. We compared their performance with the results of various classifiers trained with Machine Learning (ML). We found that LLMs perform worse than MLs in detecting incoherent answers. The difficulty seems to reside in recursive questions that contain both questions and answers, and in responses from students with typical fourth-grader misspellings. Upon closer examination, we have found that the ChatGPT model faces the same challenges.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
谁是最好的侦探?大型语言模型与传统机器学习在检测四年级数学不连贯答案中的对比
开放式问题的书面答案对学习的长期影响要高于多项选择题。然而,重要的是,老师要立即复习答案,并要求重做那些不连贯的答案。这可能是一项艰巨的任务,对教师来说可能很耗时。一个可能的解决方案是自动检测不连贯的答案。一种选择是使用大型语言模型(LLM)自动化审查。他们有强大的话语能力,可以用来解释决定。本文采用GPT-3、BLOOM和YOU三种LLMs分析了四年级学生在数学方面的反应。我们用零,一,二,三,四发子弹。我们将它们的性能与机器学习(ML)训练的各种分类器的结果进行了比较。我们发现llm在检测不连贯答案方面比ml表现得更差。困难似乎在于包含问题和答案的递归问题,以及典型的四年级拼写错误的学生的回答。经过仔细研究,我们发现ChatGPT模型面临着同样的挑战。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Journal of Educational Computing Research
Journal of Educational Computing Research EDUCATION & EDUCATIONAL RESEARCH-
CiteScore
11.90
自引率
6.20%
发文量
69
期刊介绍: The goal of this Journal is to provide an international scholarly publication forum for peer-reviewed interdisciplinary research into the applications, effects, and implications of computer-based education. The Journal features articles useful for practitioners and theorists alike. The terms "education" and "computing" are viewed broadly. “Education” refers to the use of computer-based technologies at all levels of the formal education system, business and industry, home-schooling, lifelong learning, and unintentional learning environments. “Computing” refers to all forms of computer applications and innovations - both hardware and software. For example, this could range from mobile and ubiquitous computing to immersive 3D simulations and games to computing-enhanced virtual learning environments.
期刊最新文献
Promoting Letter-Naming and Initial-Phoneme Detection Abilities Among Preschoolers at Risk for Specific Learning Disorder Using Technological Intervention With Two Types of Mats: With and Without Target Letter Forms Investigating the Effects of Artificial Intelligence-Assisted Language Learning Strategies on Cognitive Load and Learning Outcomes: A Comparative Study Curiosity, Interest, and Engagement: Unpacking Their Roles in Students’ Learning within a Virtual Game Environment Does Generative Artificial Intelligence Improve the Academic Achievement of College Students? A Meta-Analysis Designing an Inclusive Artificial Intelligence (AI) Curriculum for Elementary Students to Address Gender Differences With Collaborative and Tangible Approaches
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1