{"title":"Validity of automated essay scores for elementary-age English language learners: Evidence of bias?","authors":"Joshua Wilson , Yue Huang","doi":"10.1016/j.asw.2024.100815","DOIUrl":null,"url":null,"abstract":"<div><p>Given increased prevalence of automated writing evaluation (AWE) systems in classroom settings, more research is needed to explore the potential for bias in automated scores with respect to English language learners (ELLs). Thus, this research study investigated and compared the predictive validity of automated and human scoring methods for elementary-age English ELLs on a writing test designed for ELLs and a state writing test designed for the general population. This study focused on the MI Write AWE system and sampled 2829 students comprising ELLs and non-ELLs in Grades 3–5. Results of multilevel regression analyses and simple slopes estimation indicated that, for ELLs, the automated MI Write score had similar predictive validity to the human score for both writing tests. However, automated and human scores for ELLs were less closely related to the state writing test score than scores for non-ELL students. Findings suggest that MI Write’s automated scoring was not uniquely biased relative to human scoring but does reproduce the same biases evident with human scoring. Implications and directions for future research are discussed.</p></div>","PeriodicalId":46865,"journal":{"name":"Assessing Writing","volume":"60 ","pages":"Article 100815"},"PeriodicalIF":4.2000,"publicationDate":"2024-02-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Assessing Writing","FirstCategoryId":"98","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1075293524000084","RegionNum":1,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"EDUCATION & EDUCATIONAL RESEARCH","Score":null,"Total":0}
引用次数: 0
Abstract
Given increased prevalence of automated writing evaluation (AWE) systems in classroom settings, more research is needed to explore the potential for bias in automated scores with respect to English language learners (ELLs). Thus, this research study investigated and compared the predictive validity of automated and human scoring methods for elementary-age English ELLs on a writing test designed for ELLs and a state writing test designed for the general population. This study focused on the MI Write AWE system and sampled 2829 students comprising ELLs and non-ELLs in Grades 3–5. Results of multilevel regression analyses and simple slopes estimation indicated that, for ELLs, the automated MI Write score had similar predictive validity to the human score for both writing tests. However, automated and human scores for ELLs were less closely related to the state writing test score than scores for non-ELL students. Findings suggest that MI Write’s automated scoring was not uniquely biased relative to human scoring but does reproduce the same biases evident with human scoring. Implications and directions for future research are discussed.
期刊介绍:
Assessing Writing is a refereed international journal providing a forum for ideas, research and practice on the assessment of written language. Assessing Writing publishes articles, book reviews, conference reports, and academic exchanges concerning writing assessments of all kinds, including traditional (direct and standardised forms of) testing of writing, alternative performance assessments (such as portfolios), workplace sampling and classroom assessment. The journal focuses on all stages of the writing assessment process, including needs evaluation, assessment creation, implementation, and validation, and test development.