{"title":"在绩效评估中检测较高的中心性效应:一种基于模型的中心性指数比较","authors":"K. Jin, T. Eckes","doi":"10.1080/15366367.2021.1972654","DOIUrl":null,"url":null,"abstract":"ABSTRACT Recent research on rater effects in performance assessments has increasingly focused on rater centrality, the tendency to assign scores clustering around the rating scale’s middle categories. In the present paper, we adopted Jin and Wang’s (2018) extended facets modeling approach and constructed a centrality continuum, ranging from raters exhibiting strong central tendencies to raters exhibiting strong tendencies in the opposite direction (extremity). In two simulation studies, we examined three model-based centrality detection indices (rater infit statistics, residual–expected correlations, and rater threshold SDs) as well as the raw-score SD in terms of their efficiency of reconstructing the true rater centrality rank ordering. Findings confirmed the superiority of the residual–expected correlation, rater threshold SD, and raw-score SD statistics, particularly when the examinee sample size was large and the number of scoring criteria was high. By contrast, the infit statistic results were much less consistent and, under conditions of large differences between criterion difficulties, suggested erroneous conclusions about raters’ central tendencies. Analyzing real rating data from a large-scale speaking performance assessment confirmed that infit statistics are unsuitable for identifying raters’ central tendencies. The discussion focuses on detecting centrality effects under different facets models and the indices’ implications for rater monitoring and fair performance assessment.","PeriodicalId":46596,"journal":{"name":"Measurement-Interdisciplinary Research and Perspectives","volume":"55 1","pages":"228 - 247"},"PeriodicalIF":0.6000,"publicationDate":"2022-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":"{\"title\":\"Detecting Rater Centrality Effects in Performance Assessments: A Model-Based Comparison of Centrality Indices\",\"authors\":\"K. Jin, T. Eckes\",\"doi\":\"10.1080/15366367.2021.1972654\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"ABSTRACT Recent research on rater effects in performance assessments has increasingly focused on rater centrality, the tendency to assign scores clustering around the rating scale’s middle categories. In the present paper, we adopted Jin and Wang’s (2018) extended facets modeling approach and constructed a centrality continuum, ranging from raters exhibiting strong central tendencies to raters exhibiting strong tendencies in the opposite direction (extremity). In two simulation studies, we examined three model-based centrality detection indices (rater infit statistics, residual–expected correlations, and rater threshold SDs) as well as the raw-score SD in terms of their efficiency of reconstructing the true rater centrality rank ordering. Findings confirmed the superiority of the residual–expected correlation, rater threshold SD, and raw-score SD statistics, particularly when the examinee sample size was large and the number of scoring criteria was high. By contrast, the infit statistic results were much less consistent and, under conditions of large differences between criterion difficulties, suggested erroneous conclusions about raters’ central tendencies. Analyzing real rating data from a large-scale speaking performance assessment confirmed that infit statistics are unsuitable for identifying raters’ central tendencies. The discussion focuses on detecting centrality effects under different facets models and the indices’ implications for rater monitoring and fair performance assessment.\",\"PeriodicalId\":46596,\"journal\":{\"name\":\"Measurement-Interdisciplinary Research and Perspectives\",\"volume\":\"55 1\",\"pages\":\"228 - 247\"},\"PeriodicalIF\":0.6000,\"publicationDate\":\"2022-10-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"6\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Measurement-Interdisciplinary Research and Perspectives\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1080/15366367.2021.1972654\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"SOCIAL SCIENCES, INTERDISCIPLINARY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Measurement-Interdisciplinary Research and Perspectives","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1080/15366367.2021.1972654","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"SOCIAL SCIENCES, INTERDISCIPLINARY","Score":null,"Total":0}
Detecting Rater Centrality Effects in Performance Assessments: A Model-Based Comparison of Centrality Indices
ABSTRACT Recent research on rater effects in performance assessments has increasingly focused on rater centrality, the tendency to assign scores clustering around the rating scale’s middle categories. In the present paper, we adopted Jin and Wang’s (2018) extended facets modeling approach and constructed a centrality continuum, ranging from raters exhibiting strong central tendencies to raters exhibiting strong tendencies in the opposite direction (extremity). In two simulation studies, we examined three model-based centrality detection indices (rater infit statistics, residual–expected correlations, and rater threshold SDs) as well as the raw-score SD in terms of their efficiency of reconstructing the true rater centrality rank ordering. Findings confirmed the superiority of the residual–expected correlation, rater threshold SD, and raw-score SD statistics, particularly when the examinee sample size was large and the number of scoring criteria was high. By contrast, the infit statistic results were much less consistent and, under conditions of large differences between criterion difficulties, suggested erroneous conclusions about raters’ central tendencies. Analyzing real rating data from a large-scale speaking performance assessment confirmed that infit statistics are unsuitable for identifying raters’ central tendencies. The discussion focuses on detecting centrality effects under different facets models and the indices’ implications for rater monitoring and fair performance assessment.