{"title":"Beyond Agreement: Exploring Rater Effects in Large-Scale Mixed Format Assessments","authors":"Stefanie A. Wind, Wenjing Guo","doi":"10.1080/10627197.2021.1962277","DOIUrl":null,"url":null,"abstract":"ABSTRACT Scoring procedures for the constructed-response (CR) items in large-scale mixed-format educational assessments often involve checks for rater agreement or rater reliability. Although these analyses are important, researchers have documented rater effects that persist despite rater training and that are not always detected in rater agreement and reliability analyses, such as severity/leniency, centrality/extremism, and biases. Left undetected, these effects pose threats to fairness. We illustrate how rater effects analyses can be incorporated into scoring procedures for large-scale mixed-format assessments. We used data from the National Assessment of Educational Progress (NAEP) to illustrate relatively simple analyses that can provide insight into patterns of rater judgment that may warrant additional attention. Our results suggested that the NAEP raters exhibited generally defensible psychometric properties, while also exhibiting some idiosyncrasies that could inform scoring procedures. Similar procedures could be used operationally to inform the interpretation and use of rater judgments in large-scale mixed-format assessments.","PeriodicalId":46209,"journal":{"name":"Educational Assessment","volume":"26 1","pages":"264 - 283"},"PeriodicalIF":2.1000,"publicationDate":"2021-08-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Educational Assessment","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1080/10627197.2021.1962277","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"EDUCATION & EDUCATIONAL RESEARCH","Score":null,"Total":0}
引用次数: 1
Abstract
ABSTRACT Scoring procedures for the constructed-response (CR) items in large-scale mixed-format educational assessments often involve checks for rater agreement or rater reliability. Although these analyses are important, researchers have documented rater effects that persist despite rater training and that are not always detected in rater agreement and reliability analyses, such as severity/leniency, centrality/extremism, and biases. Left undetected, these effects pose threats to fairness. We illustrate how rater effects analyses can be incorporated into scoring procedures for large-scale mixed-format assessments. We used data from the National Assessment of Educational Progress (NAEP) to illustrate relatively simple analyses that can provide insight into patterns of rater judgment that may warrant additional attention. Our results suggested that the NAEP raters exhibited generally defensible psychometric properties, while also exhibiting some idiosyncrasies that could inform scoring procedures. Similar procedures could be used operationally to inform the interpretation and use of rater judgments in large-scale mixed-format assessments.
期刊介绍:
Educational Assessment publishes original research and scholarship on the assessment of individuals, groups, and programs in educational settings. It includes theory, methodological approaches and empirical research in the appraisal of the learning and achievement of students and teachers, young children and adults, and novices and experts. The journal reports on current large-scale testing practices, discusses alternative approaches, presents scholarship on classroom assessment practices and includes assessment topics debated at the national level. It welcomes both conceptual and empirical pieces and encourages articles that provide a strong bridge between theory and/or empirical research and the implications for educational policy and/or practice.