Rater variability across examinees and rating criteria in paired speaking assessment

IF 0.6 Q4 LINGUISTICS Studies in Language Assessment Pub Date : 2018-01-01 DOI:10.58379/yvwq3768

S. Youn

{"title":"Rater variability across examinees and rating criteria in paired speaking assessment","authors":"S. Youn","doi":"10.58379/yvwq3768","DOIUrl":null,"url":null,"abstract":"This study investigates rater variability with regard to examinees’ levels and rating criteria in paired speaking assessment. 12 raters completed rater training and scored 102 examinees’ paired speaking performances using analytical rating criteria that reflect various features of paired speaking performance. The raters were fairly consistent in their overall ratings, but differed in their severity. The bias analyses using many-facet Rasch measurement revealed that a higher level of rater bias interaction was found for the rating criteria compared to those of the examinees’ levels and the pairing type which reflects a level difference between two examinees. In particular, the most challenging rating category Language Use attracted significant bias interactions. However, the raters did not display more frequent bias interactions based on the interaction-specific rating categories, such as Engaging with Interaction and Turn Organization. Furthermore, the raters tended to reverse their severity patterns across the rating categories. In the rater and examinee bias interactions, the raters tended to show more frequent bias toward the low-level examinees. However, no significant rater bias was found based on the pairing type that consisted of high-level and low-level examinees. These findings have implications for rater training in paired speaking assessment.","PeriodicalId":29650,"journal":{"name":"Studies in Language Assessment","volume":"545 1","pages":""},"PeriodicalIF":0.6000,"publicationDate":"2018-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Studies in Language Assessment","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.58379/yvwq3768","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"LINGUISTICS","Score":null,"Total":0}

引用次数: 7

Abstract

This study investigates rater variability with regard to examinees’ levels and rating criteria in paired speaking assessment. 12 raters completed rater training and scored 102 examinees’ paired speaking performances using analytical rating criteria that reflect various features of paired speaking performance. The raters were fairly consistent in their overall ratings, but differed in their severity. The bias analyses using many-facet Rasch measurement revealed that a higher level of rater bias interaction was found for the rating criteria compared to those of the examinees’ levels and the pairing type which reflects a level difference between two examinees. In particular, the most challenging rating category Language Use attracted significant bias interactions. However, the raters did not display more frequent bias interactions based on the interaction-specific rating categories, such as Engaging with Interaction and Turn Organization. Furthermore, the raters tended to reverse their severity patterns across the rating categories. In the rater and examinee bias interactions, the raters tended to show more frequent bias toward the low-level examinees. However, no significant rater bias was found based on the pairing type that consisted of high-level and low-level examinees. These findings have implications for rater training in paired speaking assessment.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

在配对口语评估中，考生之间的评分差异和评分标准

本研究探讨了配对口语测试中考生水平和评分标准的变异性。12名评分员完成了评分员培训，并使用反映配对口语表现各种特征的分析评分标准对102名考生的配对口语表现进行评分。评分者在总体评分上相当一致，但在严重程度上有所不同。使用多面Rasch测量的偏倚分析表明，与考生水平和配对类型相比，评分标准存在更高水平的偏倚相互作用，反映了两个考生之间的水平差异。特别是，最具挑战性的评级类别语言使用吸引了显著的偏见互动。然而，评分者并没有表现出更频繁的偏见互动，这是基于特定于互动的评级类别，比如参与互动和回合组织。此外，评分者倾向于在评分类别中扭转他们的严重程度模式。在评分者与考生的偏见互动中，评分者对低水平考生的偏见更频繁。然而，基于高水平和低水平考生的配对类型，没有发现显著的偏倚。这些发现对配对口语评估的评分训练具有启示意义。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Studies in Language Assessment

自引率

0.00%

发文量