David vs. Goliath: comparing conventional machine learning and a large language model for assessing students' concept use in a physics problem.

IF 4.7 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Frontiers in Artificial Intelligence Pub Date : 2024-09-18 eCollection Date: 2024-01-01 DOI:10.3389/frai.2024.1408817

Fabian Kieser, Paul Tschisgale, Sophia Rauh, Xiaoyu Bai, Holger Maus, Stefan Petersen, Manfred Stede, Knut Neumann, Peter Wulff

{"title":"David vs. Goliath: comparing conventional machine learning and a large language model for assessing students' concept use in a physics problem.","authors":"Fabian Kieser, Paul Tschisgale, Sophia Rauh, Xiaoyu Bai, Holger Maus, Stefan Petersen, Manfred Stede, Knut Neumann, Peter Wulff","doi":"10.3389/frai.2024.1408817","DOIUrl":null,"url":null,"abstract":"<p><p>Large language models have been shown to excel in many different tasks across disciplines and research sites. They provide novel opportunities to enhance educational research and instruction in different ways such as assessment. However, these methods have also been shown to have fundamental limitations. These relate, among others, to hallucinating knowledge, explainability of model decisions, and resource expenditure. As such, more conventional machine learning algorithms might be more convenient for specific research problems because they allow researchers more control over their research. Yet, the circumstances in which either conventional machine learning or large language models are preferable choices are not well understood. This study seeks to answer the question to what extent either conventional machine learning algorithms or a recently advanced large language model performs better in assessing students' concept use in a physics problem-solving task. We found that conventional machine learning algorithms in combination outperformed the large language model. Model decisions were then analyzed via closer examination of the models' classifications. We conclude that in specific contexts, conventional machine learning can supplement large language models, especially when labeled data is available.</p>","PeriodicalId":33315,"journal":{"name":"Frontiers in Artificial Intelligence","volume":"7 ","pages":"1408817"},"PeriodicalIF":4.7000,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11445140/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Frontiers in Artificial Intelligence","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3389/frai.2024.1408817","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/1/1 0:00:00","PubModel":"eCollection","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Large language models have been shown to excel in many different tasks across disciplines and research sites. They provide novel opportunities to enhance educational research and instruction in different ways such as assessment. However, these methods have also been shown to have fundamental limitations. These relate, among others, to hallucinating knowledge, explainability of model decisions, and resource expenditure. As such, more conventional machine learning algorithms might be more convenient for specific research problems because they allow researchers more control over their research. Yet, the circumstances in which either conventional machine learning or large language models are preferable choices are not well understood. This study seeks to answer the question to what extent either conventional machine learning algorithms or a recently advanced large language model performs better in assessing students' concept use in a physics problem-solving task. We found that conventional machine learning algorithms in combination outperformed the large language model. Model decisions were then analyzed via closer examination of the models' classifications. We conclude that in specific contexts, conventional machine learning can supplement large language models, especially when labeled data is available.

Abstract Image

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

David vs. Goliath：比较传统机器学习和大型语言模型在物理问题中对学生概念使用的评估。

大型语言模型在不同学科和研究场所的许多不同任务中都表现出色。它们为以评估等不同方式加强教育研究和教学提供了新的机会。然而，这些方法也被证明具有根本性的局限性。这些限制主要涉及知识的幻觉、模型决策的可解释性和资源消耗。因此，更传统的机器学习算法可能更便于特定研究问题的解决，因为它们允许研究人员对研究进行更多控制。然而，在哪些情况下，传统机器学习或大型语言模型是更好的选择，目前还不十分清楚。本研究试图回答这样一个问题：在评估学生在物理问题解决任务中的概念使用情况时，传统的机器学习算法和最新的大型语言模型在多大程度上表现更佳。我们发现，传统机器学习算法的综合表现优于大型语言模型。然后，我们通过对模型分类的仔细研究，对模型决策进行了分析。我们的结论是，在特定情况下，传统机器学习可以补充大型语言模型的不足，尤其是在有标记数据的情况下。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊