Validity of automated essay scores for elementary-age English language learners: Evidence of bias?

IF 4.2 1区文学 Q1 EDUCATION & EDUCATIONAL RESEARCH Assessing Writing Pub Date : 2024-02-17 DOI:10.1016/j.asw.2024.100815

Joshua Wilson , Yue Huang

{"title":"Validity of automated essay scores for elementary-age English language learners: Evidence of bias?","authors":"Joshua Wilson , Yue Huang","doi":"10.1016/j.asw.2024.100815","DOIUrl":null,"url":null,"abstract":"<div><p>Given increased prevalence of automated writing evaluation (AWE) systems in classroom settings, more research is needed to explore the potential for bias in automated scores with respect to English language learners (ELLs). Thus, this research study investigated and compared the predictive validity of automated and human scoring methods for elementary-age English ELLs on a writing test designed for ELLs and a state writing test designed for the general population. This study focused on the MI Write AWE system and sampled 2829 students comprising ELLs and non-ELLs in Grades 3–5. Results of multilevel regression analyses and simple slopes estimation indicated that, for ELLs, the automated MI Write score had similar predictive validity to the human score for both writing tests. However, automated and human scores for ELLs were less closely related to the state writing test score than scores for non-ELL students. Findings suggest that MI Write’s automated scoring was not uniquely biased relative to human scoring but does reproduce the same biases evident with human scoring. Implications and directions for future research are discussed.</p></div>","PeriodicalId":46865,"journal":{"name":"Assessing Writing","volume":"60 ","pages":"Article 100815"},"PeriodicalIF":4.2000,"publicationDate":"2024-02-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Assessing Writing","FirstCategoryId":"98","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1075293524000084","RegionNum":1,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"EDUCATION & EDUCATIONAL RESEARCH","Score":null,"Total":0}

引用次数: 0

Abstract

Given increased prevalence of automated writing evaluation (AWE) systems in classroom settings, more research is needed to explore the potential for bias in automated scores with respect to English language learners (ELLs). Thus, this research study investigated and compared the predictive validity of automated and human scoring methods for elementary-age English ELLs on a writing test designed for ELLs and a state writing test designed for the general population. This study focused on the MI Write AWE system and sampled 2829 students comprising ELLs and non-ELLs in Grades 3–5. Results of multilevel regression analyses and simple slopes estimation indicated that, for ELLs, the automated MI Write score had similar predictive validity to the human score for both writing tests. However, automated and human scores for ELLs were less closely related to the state writing test score than scores for non-ELL students. Findings suggest that MI Write’s automated scoring was not uniquely biased relative to human scoring but does reproduce the same biases evident with human scoring. Implications and directions for future research are discussed.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

针对小学英语学习者的自动作文评分的有效性：偏见的证据？

鉴于自动写作评价（AWE）系统在课堂教学环境中的日益普及，我们需要进行更多的研究来探讨自动评分对英语语言学习者（ELLs）可能产生的偏差。因此，本研究调查并比较了自动评分法和人工评分法对小学英语学习者在专为英语学习者设计的写作测试和专为普通人群设计的州立写作测试中的预测有效性。这项研究以 MI Write AWE 系统为重点，抽样调查了 2829 名三至五年级的英语语言学习者和非英语语言学习者。多层次回归分析和简单斜率估计的结果表明，对于英语语言学习者而言，在两种写作测试中，MI Write 的自动评分与人工评分具有相似的预测效力。然而，与非英语语言学生的分数相比，英语语言学生的自动分数和人工分数与州写作测试分数的关系并不那么密切。研究结果表明，与人工评分相比，MI Write 的自动评分并不存在独特的偏差，但确实再现了人工评分中明显存在的偏差。本文讨论了未来研究的意义和方向。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Assessing Writing Multiple-

CiteScore

6.00

自引率

17.90%

发文量

期刊介绍： Assessing Writing is a refereed international journal providing a forum for ideas, research and practice on the assessment of written language. Assessing Writing publishes articles, book reviews, conference reports, and academic exchanges concerning writing assessments of all kinds, including traditional (direct and standardised forms of) testing of writing, alternative performance assessments (such as portfolios), workplace sampling and classroom assessment. The journal focuses on all stages of the writing assessment process, including needs evaluation, assessment creation, implementation, and validation, and test development.