Machine Learning-Based Automated Grading and Feedback Tools for Programming: A Meta-Analysis

Proceedings of the 2023 Conference on Innovation and Technology in Computer Science Education V. 1 Pub Date : 2023-06-29 DOI:10.1145/3587102.3588822

Marcus Messer, Noam Brown, Michael Kölling, Miaojing Shi

{"title":"Machine Learning-Based Automated Grading and Feedback Tools for Programming: A Meta-Analysis","authors":"Marcus Messer, Noam Brown, Michael Kölling, Miaojing Shi","doi":"10.1145/3587102.3588822","DOIUrl":null,"url":null,"abstract":"Research into automated grading has increased as Computer Science courses grow. Dynamic and static approaches are typically used to implement these graders, the most common implementation being unit testing to grade correctness. This paper expands upon an ongoing systematic literature review to provide an in-depth analysis of how machine learning (ML) has been used to grade and give feedback on programming assignments. We conducted a backward snowball search using the ML papers from an ongoing systematic review and selected 27 papers that met our inclusion criteria. After selecting our papers, we analysed the skills graded, the preprocessing steps, the ML implementation, and the models' evaluations. We find that most the models are implemented using neural network-based approaches, with most implementing some form of recurrent neural network (RNN), including Long Short-Term Memory, and encoder/decoder with attention mechanisms. Some graders implement traditional ML approaches, typically focused on clustering. Most ML-based automated grading, not many use ML to evaluate maintainability, readability, and documentation, but focus on grading correctness, a problem that dynamic and static analysis techniques, such as unit testing, rule-based program repair, and comparison to models or approved solutions, have mostly resolved. However, some ML-based tools, including those for assessing graphical output, have evaluated the correctness of assignments that conventional implementations cannot.","PeriodicalId":410890,"journal":{"name":"Proceedings of the 2023 Conference on Innovation and Technology in Computer Science Education V. 1","volume":"43 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-06-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2023 Conference on Innovation and Technology in Computer Science Education V. 1","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3587102.3588822","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Research into automated grading has increased as Computer Science courses grow. Dynamic and static approaches are typically used to implement these graders, the most common implementation being unit testing to grade correctness. This paper expands upon an ongoing systematic literature review to provide an in-depth analysis of how machine learning (ML) has been used to grade and give feedback on programming assignments. We conducted a backward snowball search using the ML papers from an ongoing systematic review and selected 27 papers that met our inclusion criteria. After selecting our papers, we analysed the skills graded, the preprocessing steps, the ML implementation, and the models' evaluations. We find that most the models are implemented using neural network-based approaches, with most implementing some form of recurrent neural network (RNN), including Long Short-Term Memory, and encoder/decoder with attention mechanisms. Some graders implement traditional ML approaches, typically focused on clustering. Most ML-based automated grading, not many use ML to evaluate maintainability, readability, and documentation, but focus on grading correctness, a problem that dynamic and static analysis techniques, such as unit testing, rule-based program repair, and comparison to models or approved solutions, have mostly resolved. However, some ML-based tools, including those for assessing graphical output, have evaluated the correctness of assignments that conventional implementations cannot.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

基于机器学习的编程自动评分和反馈工具:荟萃分析

随着计算机科学课程的发展，对自动评分的研究也在增加。动态和静态方法通常用于实现这些分级，最常见的实现是单元测试来分级正确性。本文扩展了正在进行的系统文献综述，深入分析了机器学习(ML)如何用于对编程作业进行评分和反馈。我们使用正在进行的系统综述中的ML论文进行了向后滚雪球搜索，并选择了27篇符合我们纳入标准的论文。在选择我们的论文后，我们分析了技能分级，预处理步骤，ML实现和模型评估。我们发现大多数模型都是使用基于神经网络的方法实现的，其中大多数实现了某种形式的循环神经网络(RNN)，包括长短期记忆和带有注意机制的编码器/解码器。一些评分员实现传统的ML方法，通常侧重于聚类。大多数基于ML的自动分级，并没有很多人使用ML来评估可维护性、可读性和文档，而是专注于分级正确性，这是动态和静态分析技术(如单元测试、基于规则的程序修复以及与模型或批准的解决方案的比较)已经解决的问题。然而，一些基于ml的工具，包括那些用于评估图形输出的工具，已经评估了传统实现无法评估的赋值的正确性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Proceedings of the 2023 Conference on Innovation and Technology in Computer Science Education V. 1

自引率

0.00%

发文量

期刊最新文献

Automatic Problem Generation for CTF-Style Assessments in IT Forensics Courses The Value of Time Extensions in Identifying Students Abilities Studied Questions in Data Structures and Algorithms Assessments Exploring CS1 Student's Notions of Code Quality Pseudocode vs. Compile-and-Run Prompts: Comparing Measures of Student Programming Ability in CS1 and CS2