The Influence of Rater Effects in Training Sets on the Psychometric Quality of Automated Scoring for Writing Assessments

IF 1.4 Q2 SOCIAL SCIENCES, INTERDISCIPLINARY International Journal of Testing Pub Date : 2018-01-02 DOI:10.1080/15305058.2017.1361426

Stefanie A. Wind, E. Wolfe, G. Engelhard, P. Foltz, Mark Rosenstein

引用次数: 11

Abstract

Automated essay scoring engines (AESEs) are becoming increasingly popular as an efficient method for performance assessments in writing, including many language assessments that are used worldwide. Before they can be used operationally, AESEs must be “trained” using machine-learning techniques that incorporate human ratings. However, the quality of the human ratings used to train the AESEs is rarely examined. As a result, the impact of various rater effects (e.g., severity and centrality) on the quality of AESE-assigned scores is not known. In this study, we use data from a large-scale rater-mediated writing assessment to examine the impact of rater effects on the quality of AESE-assigned scores. Overall, the results suggest that if rater effects are present in the ratings used to train an AESE, the AESE scores may replicate these effects. Implications are discussed in terms of research and practice related to automated scoring.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

训练集中评分者效应对写作评估自动评分心理测量质量的影响

自动论文评分引擎(AESEs)作为一种有效的写作绩效评估方法越来越受欢迎，包括世界范围内使用的许多语言评估。在使用aese之前，必须使用包含人类评级的机器学习技术对其进行“训练”。然而，用于训练aese的人类评级的质量很少被检查。因此，各种评分效应(如严重程度和中心性)对aese评分质量的影响尚不清楚。在这项研究中，我们使用了一项大规模评分者介导的写作评估的数据来检验评分者效应对aese评分质量的影响。总的来说，结果表明，如果用于训练AESE的评分中存在评分者效应，则AESE分数可能会复制这些效应。在研究和实践方面讨论了与自动评分相关的影响。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

International Journal of Testing SOCIAL SCIENCES, INTERDISCIPLINARY-

CiteScore

3.60

自引率

11.80%

发文量