Machine Learning-Based Automated Grading and Feedback Tools for Programming: A Meta-Analysis

Marcus Messer, Noam Brown, Michael Kölling, Miaojing Shi
{"title":"Machine Learning-Based Automated Grading and Feedback Tools for Programming: A Meta-Analysis","authors":"Marcus Messer, Noam Brown, Michael Kölling, Miaojing Shi","doi":"10.1145/3587102.3588822","DOIUrl":null,"url":null,"abstract":"Research into automated grading has increased as Computer Science courses grow. Dynamic and static approaches are typically used to implement these graders, the most common implementation being unit testing to grade correctness. This paper expands upon an ongoing systematic literature review to provide an in-depth analysis of how machine learning (ML) has been used to grade and give feedback on programming assignments. We conducted a backward snowball search using the ML papers from an ongoing systematic review and selected 27 papers that met our inclusion criteria. After selecting our papers, we analysed the skills graded, the preprocessing steps, the ML implementation, and the models' evaluations. We find that most the models are implemented using neural network-based approaches, with most implementing some form of recurrent neural network (RNN), including Long Short-Term Memory, and encoder/decoder with attention mechanisms. Some graders implement traditional ML approaches, typically focused on clustering. Most ML-based automated grading, not many use ML to evaluate maintainability, readability, and documentation, but focus on grading correctness, a problem that dynamic and static analysis techniques, such as unit testing, rule-based program repair, and comparison to models or approved solutions, have mostly resolved. However, some ML-based tools, including those for assessing graphical output, have evaluated the correctness of assignments that conventional implementations cannot.","PeriodicalId":410890,"journal":{"name":"Proceedings of the 2023 Conference on Innovation and Technology in Computer Science Education V. 1","volume":"43 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-06-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2023 Conference on Innovation and Technology in Computer Science Education V. 1","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3587102.3588822","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Research into automated grading has increased as Computer Science courses grow. Dynamic and static approaches are typically used to implement these graders, the most common implementation being unit testing to grade correctness. This paper expands upon an ongoing systematic literature review to provide an in-depth analysis of how machine learning (ML) has been used to grade and give feedback on programming assignments. We conducted a backward snowball search using the ML papers from an ongoing systematic review and selected 27 papers that met our inclusion criteria. After selecting our papers, we analysed the skills graded, the preprocessing steps, the ML implementation, and the models' evaluations. We find that most the models are implemented using neural network-based approaches, with most implementing some form of recurrent neural network (RNN), including Long Short-Term Memory, and encoder/decoder with attention mechanisms. Some graders implement traditional ML approaches, typically focused on clustering. Most ML-based automated grading, not many use ML to evaluate maintainability, readability, and documentation, but focus on grading correctness, a problem that dynamic and static analysis techniques, such as unit testing, rule-based program repair, and comparison to models or approved solutions, have mostly resolved. However, some ML-based tools, including those for assessing graphical output, have evaluated the correctness of assignments that conventional implementations cannot.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于机器学习的编程自动评分和反馈工具:荟萃分析
随着计算机科学课程的发展,对自动评分的研究也在增加。动态和静态方法通常用于实现这些分级,最常见的实现是单元测试来分级正确性。本文扩展了正在进行的系统文献综述,深入分析了机器学习(ML)如何用于对编程作业进行评分和反馈。我们使用正在进行的系统综述中的ML论文进行了向后滚雪球搜索,并选择了27篇符合我们纳入标准的论文。在选择我们的论文后,我们分析了技能分级,预处理步骤,ML实现和模型评估。我们发现大多数模型都是使用基于神经网络的方法实现的,其中大多数实现了某种形式的循环神经网络(RNN),包括长短期记忆和带有注意机制的编码器/解码器。一些评分员实现传统的ML方法,通常侧重于聚类。大多数基于ML的自动分级,并没有很多人使用ML来评估可维护性、可读性和文档,而是专注于分级正确性,这是动态和静态分析技术(如单元测试、基于规则的程序修复以及与模型或批准的解决方案的比较)已经解决的问题。然而,一些基于ml的工具,包括那些用于评估图形输出的工具,已经评估了传统实现无法评估的赋值的正确性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Automatic Problem Generation for CTF-Style Assessments in IT Forensics Courses The Value of Time Extensions in Identifying Students Abilities Studied Questions in Data Structures and Algorithms Assessments Exploring CS1 Student's Notions of Code Quality Pseudocode vs. Compile-and-Run Prompts: Comparing Measures of Student Programming Ability in CS1 and CS2
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1