Automated Marking System for Essay Questions

Journal of Engineering Research and Reports Pub Date : 2024-04-08 DOI:10.9734/jerr/2024/v26i51139

O. Obot, Peter G. Obike, Imaobong James

{"title":"Automated Marking System for Essay Questions","authors":"O. Obot, Peter G. Obike, Imaobong James","doi":"10.9734/jerr/2024/v26i51139","DOIUrl":null,"url":null,"abstract":"The stress of marking assessment scripts of many candidates often results in fatigue that could lead to low productivity and reduced consistency. In most cases, candidates use words, phrases and sentences that are synonyms or related in meaning to those stated in the marking scheme, however, examiners rely solely on the exact words specified in the marking scheme. This often leads to inconsistent grading and in most cases, candidates are disadvantaged. This study seeks to address these inconsistencies during assessment by evaluating the marked answer scripts and the marking scheme of Introduction to File Processing (CSC 221) from the Department of Computer Science, University of Uyo, Nigeria. These were collected and used with the Microsoft Research Paraphrase (MSRP) corpus. After preprocessing the datasets, they were subjected to Logistic Regression (LR), a machine learning technique where the semantic similarity of the answers of the candidates was measured in relation to the marking scheme of the examiner using the MSRP corpus model earlier trained on the Term Frequency-Inverse Document Frequency (TF-IDF) vectorization. Results of the experiment show a strong correlation coefficient of 0.89 and a Mean Relative Error (MRE) of 0.59 compared with the scores awarded by the human marker (examiner). Analysis of the error indicates that block marks were assigned to answers in the marking scheme while the automated marking system breaks the block marks into chunks based on phrases both in the marking scheme and the candidates’ answers. It also shows that some semantically related words were ignored by the examiner.","PeriodicalId":508164,"journal":{"name":"Journal of Engineering Research and Reports","volume":"74 7","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Engineering Research and Reports","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.9734/jerr/2024/v26i51139","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

The stress of marking assessment scripts of many candidates often results in fatigue that could lead to low productivity and reduced consistency. In most cases, candidates use words, phrases and sentences that are synonyms or related in meaning to those stated in the marking scheme, however, examiners rely solely on the exact words specified in the marking scheme. This often leads to inconsistent grading and in most cases, candidates are disadvantaged. This study seeks to address these inconsistencies during assessment by evaluating the marked answer scripts and the marking scheme of Introduction to File Processing (CSC 221) from the Department of Computer Science, University of Uyo, Nigeria. These were collected and used with the Microsoft Research Paraphrase (MSRP) corpus. After preprocessing the datasets, they were subjected to Logistic Regression (LR), a machine learning technique where the semantic similarity of the answers of the candidates was measured in relation to the marking scheme of the examiner using the MSRP corpus model earlier trained on the Term Frequency-Inverse Document Frequency (TF-IDF) vectorization. Results of the experiment show a strong correlation coefficient of 0.89 and a Mean Relative Error (MRE) of 0.59 compared with the scores awarded by the human marker (examiner). Analysis of the error indicates that block marks were assigned to answers in the marking scheme while the automated marking system breaks the block marks into chunks based on phrases both in the marking scheme and the candidates’ answers. It also shows that some semantically related words were ignored by the examiner.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

作文题自动评分系统

为众多考生评卷的压力往往会导致疲劳，从而导致工作效率低下和一致性降低。在大多数情况下，考生使用的单词、短语和句子与评分标准中规定的单词、短语和句子是同义词或意思相关，但考官却完全依赖评分标准中规定的确切单词。这往往导致评分不一致，在大多数情况下，考生处于不利地位。本研究试图通过评估尼日利亚乌约大学计算机科学系《文件处理入门》（CSC 221）课程的答卷评分和评分标准来解决评估过程中的不一致问题。这些数据被收集起来，并与 Microsoft Research Paraphrase (MSRP) 语料库一起使用。对数据集进行预处理后，对其进行了机器学习技术 Logistic Regression (LR)，利用 MSRP 语料库模型，结合考官的评分标准测量考生答案的语义相似性，该模型先前在术语频率-反向文档频率 (TF-IDF) 矢量化中进行过训练。实验结果表明，与人工阅卷员（考官）给出的分数相比，相关系数为 0.89，平均相对误差为 0.59。对误差的分析表明，在评分标准中，整块分数是分配给答案的，而自动评分系统则根据评分标准和考生答案中的短语将整块分数分成若干块。分析还表明，考官忽略了一些语义相关的词语。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Journal of Engineering Research and Reports

自引率

0.00%

发文量