David vs. Goliath: comparing conventional machine learning and a large language model for assessing students' concept use in a physics problem.

IF 3 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Frontiers in Artificial Intelligence Pub Date : 2024-09-18 eCollection Date: 2024-01-01 DOI:10.3389/frai.2024.1408817
Fabian Kieser, Paul Tschisgale, Sophia Rauh, Xiaoyu Bai, Holger Maus, Stefan Petersen, Manfred Stede, Knut Neumann, Peter Wulff
{"title":"David vs. Goliath: comparing conventional machine learning and a large language model for assessing students' concept use in a physics problem.","authors":"Fabian Kieser, Paul Tschisgale, Sophia Rauh, Xiaoyu Bai, Holger Maus, Stefan Petersen, Manfred Stede, Knut Neumann, Peter Wulff","doi":"10.3389/frai.2024.1408817","DOIUrl":null,"url":null,"abstract":"<p><p>Large language models have been shown to excel in many different tasks across disciplines and research sites. They provide novel opportunities to enhance educational research and instruction in different ways such as assessment. However, these methods have also been shown to have fundamental limitations. These relate, among others, to hallucinating knowledge, explainability of model decisions, and resource expenditure. As such, more conventional machine learning algorithms might be more convenient for specific research problems because they allow researchers more control over their research. Yet, the circumstances in which either conventional machine learning or large language models are preferable choices are not well understood. This study seeks to answer the question to what extent either conventional machine learning algorithms or a recently advanced large language model performs better in assessing students' concept use in a physics problem-solving task. We found that conventional machine learning algorithms in combination outperformed the large language model. Model decisions were then analyzed via closer examination of the models' classifications. We conclude that in specific contexts, conventional machine learning can supplement large language models, especially when labeled data is available.</p>","PeriodicalId":33315,"journal":{"name":"Frontiers in Artificial Intelligence","volume":"7 ","pages":"1408817"},"PeriodicalIF":3.0000,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11445140/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Frontiers in Artificial Intelligence","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3389/frai.2024.1408817","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/1/1 0:00:00","PubModel":"eCollection","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

Abstract

Large language models have been shown to excel in many different tasks across disciplines and research sites. They provide novel opportunities to enhance educational research and instruction in different ways such as assessment. However, these methods have also been shown to have fundamental limitations. These relate, among others, to hallucinating knowledge, explainability of model decisions, and resource expenditure. As such, more conventional machine learning algorithms might be more convenient for specific research problems because they allow researchers more control over their research. Yet, the circumstances in which either conventional machine learning or large language models are preferable choices are not well understood. This study seeks to answer the question to what extent either conventional machine learning algorithms or a recently advanced large language model performs better in assessing students' concept use in a physics problem-solving task. We found that conventional machine learning algorithms in combination outperformed the large language model. Model decisions were then analyzed via closer examination of the models' classifications. We conclude that in specific contexts, conventional machine learning can supplement large language models, especially when labeled data is available.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
David vs. Goliath:比较传统机器学习和大型语言模型在物理问题中对学生概念使用的评估。
大型语言模型在不同学科和研究场所的许多不同任务中都表现出色。它们为以评估等不同方式加强教育研究和教学提供了新的机会。然而,这些方法也被证明具有根本性的局限性。这些限制主要涉及知识的幻觉、模型决策的可解释性和资源消耗。因此,更传统的机器学习算法可能更便于特定研究问题的解决,因为它们允许研究人员对研究进行更多控制。然而,在哪些情况下,传统机器学习或大型语言模型是更好的选择,目前还不十分清楚。本研究试图回答这样一个问题:在评估学生在物理问题解决任务中的概念使用情况时,传统的机器学习算法和最新的大型语言模型在多大程度上表现更佳。我们发现,传统机器学习算法的综合表现优于大型语言模型。然后,我们通过对模型分类的仔细研究,对模型决策进行了分析。我们的结论是,在特定情况下,传统机器学习可以补充大型语言模型的不足,尤其是在有标记数据的情况下。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
CiteScore
6.10
自引率
2.50%
发文量
272
审稿时长
13 weeks
期刊最新文献
Advancing smart city factories: enhancing industrial mechanical operations via deep learning techniques. Inpainting of damaged temple murals using edge- and line-guided diffusion patch GAN. Catalyzing IVF outcome prediction: exploring advanced machine learning paradigms for enhanced success rate prognostication. Predicting patient reported outcome measures: a scoping review for the artificial intelligence-guided patient preference predictor. A generative AI-driven interactive listening assessment task.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1