{"title":"评分者行为的预测模型:对论文评分质量保证的启示","authors":"I. Bejar, Chen Li, D. McCaffrey","doi":"10.1080/08957347.2020.1750406","DOIUrl":null,"url":null,"abstract":"ABSTRACT We evaluate the feasibility of developing predictive models of rater behavior, that is, rater-specific models for predicting the scores produced by a rater under operational conditions. In the present study, the dependent variable is the score assigned to essays by a rater, and the predictors are linguistic attributes of the essays used by the e-rater® engine. Specifically, for each rater, the linear regression of rater scores on the linguistic attributes is obtained based on data from two consecutive time periods. The regression from each period was cross validated against data from the other period. Raters were characterized in terms of their level of predictability and the importance of the predictors. Results suggest that rater models capture stable individual differences among raters. To evaluate the feasibility of using rater models as a quality control mechanism, we evaluated the relationship between rater predictability and inter-rater agreement and performance on pre-scored essays. Finally, we conducted a simulation whereby raters are simulated to score exclusively as a function of essay length at different points during the scoring day. We concluded that predictive rater models merit further investigation as a means of quality controlling human scoring.","PeriodicalId":51609,"journal":{"name":"Applied Measurement in Education","volume":"33 1","pages":"234 - 247"},"PeriodicalIF":1.1000,"publicationDate":"2020-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/08957347.2020.1750406","citationCount":"1","resultStr":"{\"title\":\"Predictive Modeling of Rater Behavior: Implications for Quality Assurance in Essay Scoring\",\"authors\":\"I. Bejar, Chen Li, D. McCaffrey\",\"doi\":\"10.1080/08957347.2020.1750406\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"ABSTRACT We evaluate the feasibility of developing predictive models of rater behavior, that is, rater-specific models for predicting the scores produced by a rater under operational conditions. In the present study, the dependent variable is the score assigned to essays by a rater, and the predictors are linguistic attributes of the essays used by the e-rater® engine. Specifically, for each rater, the linear regression of rater scores on the linguistic attributes is obtained based on data from two consecutive time periods. The regression from each period was cross validated against data from the other period. Raters were characterized in terms of their level of predictability and the importance of the predictors. Results suggest that rater models capture stable individual differences among raters. To evaluate the feasibility of using rater models as a quality control mechanism, we evaluated the relationship between rater predictability and inter-rater agreement and performance on pre-scored essays. Finally, we conducted a simulation whereby raters are simulated to score exclusively as a function of essay length at different points during the scoring day. We concluded that predictive rater models merit further investigation as a means of quality controlling human scoring.\",\"PeriodicalId\":51609,\"journal\":{\"name\":\"Applied Measurement in Education\",\"volume\":\"33 1\",\"pages\":\"234 - 247\"},\"PeriodicalIF\":1.1000,\"publicationDate\":\"2020-07-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://sci-hub-pdf.com/10.1080/08957347.2020.1750406\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Applied Measurement in Education\",\"FirstCategoryId\":\"95\",\"ListUrlMain\":\"https://doi.org/10.1080/08957347.2020.1750406\",\"RegionNum\":4,\"RegionCategory\":\"教育学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"EDUCATION & EDUCATIONAL RESEARCH\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Applied Measurement in Education","FirstCategoryId":"95","ListUrlMain":"https://doi.org/10.1080/08957347.2020.1750406","RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"EDUCATION & EDUCATIONAL RESEARCH","Score":null,"Total":0}
Predictive Modeling of Rater Behavior: Implications for Quality Assurance in Essay Scoring
ABSTRACT We evaluate the feasibility of developing predictive models of rater behavior, that is, rater-specific models for predicting the scores produced by a rater under operational conditions. In the present study, the dependent variable is the score assigned to essays by a rater, and the predictors are linguistic attributes of the essays used by the e-rater® engine. Specifically, for each rater, the linear regression of rater scores on the linguistic attributes is obtained based on data from two consecutive time periods. The regression from each period was cross validated against data from the other period. Raters were characterized in terms of their level of predictability and the importance of the predictors. Results suggest that rater models capture stable individual differences among raters. To evaluate the feasibility of using rater models as a quality control mechanism, we evaluated the relationship between rater predictability and inter-rater agreement and performance on pre-scored essays. Finally, we conducted a simulation whereby raters are simulated to score exclusively as a function of essay length at different points during the scoring day. We concluded that predictive rater models merit further investigation as a means of quality controlling human scoring.
期刊介绍:
Because interaction between the domains of research and application is critical to the evaluation and improvement of new educational measurement practices, Applied Measurement in Education" prime objective is to improve communication between academicians and practitioners. To help bridge the gap between theory and practice, articles in this journal describe original research studies, innovative strategies for solving educational measurement problems, and integrative reviews of current approaches to contemporary measurement issues. Peer Review Policy: All review papers in this journal have undergone editorial screening and peer review.