Evaluating rater judgments on ETIC Advanced writing tasks: An application of generalizability theory and Many-Facet

IF 0.1 Q4 LINGUISTICS Studies in Language Assessment Pub Date : 2019-01-01 DOI:10.58379/vmak1620

Jiayu Wang, Kaizhou Luo

{"title":"Evaluating rater judgments on ETIC Advanced writing tasks: An application of generalizability theory and Many-Facet","authors":"Jiayu Wang, Kaizhou Luo","doi":"10.58379/vmak1620","DOIUrl":null,"url":null,"abstract":"Developed by China Language Assessment (CLA), the English Test for International Communication Advanced (ETIC Advanced) assesses one’s ability to perform English language tasks in international workplace contexts. ETIC Advanced is only composed of writing and speaking tasks, featured with authentic constructed response format. However, the elicitation of extended responses from candidates would call for human raters to make judgments, thus raising a critical issue of rating quality. This study aimed to evaluate rater judgements on the writing tasks of ETIC Advanced. Data in the study represented scores from 186 candidates who performed all writing tasks: Letter Writing, Report Writing, and Proposal Writing (n=3,348 ratings). Rating was conducted by six certified raters based on a six-point three-category analytical rating scale. Generalizability theory (GT) and Many-Facets Rasch Model (MFRM) were applied to analyse the scores from different perspectives. Results from GT indicated that raters’ inconsistency and interaction with other aspects resulted in a relatively low proportion of overall score variance, and that the ratings sufficed for generalization. MFRM analysis revealed that the six raters differed significantly in severity, yet remained consistent in their own judgements. Bias analyses indicated that the raters tended to assign more biased scores to low-proficient candidates and the Content category of rating scale. The study serves to demonstrate the use of both GT and MFRM to evaluate rater judgments on language performance tests. The findings of this study have implications for ETIC rater training.","PeriodicalId":29650,"journal":{"name":"Studies in Language Assessment","volume":"30 1","pages":""},"PeriodicalIF":0.1000,"publicationDate":"2019-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Studies in Language Assessment","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.58379/vmak1620","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"LINGUISTICS","Score":null,"Total":0}

引用次数: 4

Abstract

Developed by China Language Assessment (CLA), the English Test for International Communication Advanced (ETIC Advanced) assesses one’s ability to perform English language tasks in international workplace contexts. ETIC Advanced is only composed of writing and speaking tasks, featured with authentic constructed response format. However, the elicitation of extended responses from candidates would call for human raters to make judgments, thus raising a critical issue of rating quality. This study aimed to evaluate rater judgements on the writing tasks of ETIC Advanced. Data in the study represented scores from 186 candidates who performed all writing tasks: Letter Writing, Report Writing, and Proposal Writing (n=3,348 ratings). Rating was conducted by six certified raters based on a six-point three-category analytical rating scale. Generalizability theory (GT) and Many-Facets Rasch Model (MFRM) were applied to analyse the scores from different perspectives. Results from GT indicated that raters’ inconsistency and interaction with other aspects resulted in a relatively low proportion of overall score variance, and that the ratings sufficed for generalization. MFRM analysis revealed that the six raters differed significantly in severity, yet remained consistent in their own judgements. Bias analyses indicated that the raters tended to assign more biased scores to low-proficient candidates and the Content category of rating scale. The study serves to demonstrate the use of both GT and MFRM to evaluate rater judgments on language performance tests. The findings of this study have implications for ETIC rater training.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

评量ETIC进阶写作任务的评量判断:概括性理论与多面向的应用

由中国语言评估中心(CLA)开发的国际交流高级英语测试(ETIC高级)评估一个人在国际工作环境中执行英语语言任务的能力。ETIC高级只由写作和口语任务组成，具有真实的构建响应格式。然而，要从候选人那里得到更多的回答，就需要人类评级员做出判断，从而提出了评级质量的关键问题。本研究旨在评估评分者对ETIC高级写作任务的判断。研究中的数据代表了186名候选人的得分，他们完成了所有的写作任务:写信、写报告和写提案(n= 3348个评分)。评分由六名认证评分员根据六分三类分析评分量表进行。应用概化理论(Generalizability theory, GT)和多面拉希模型(many - facet Rasch Model, MFRM)从不同角度分析得分。GT结果表明，评分者的不一致性和与其他方面的相互作用导致总分方差比例相对较低，评分足以泛化。MFRM分析显示，六个评分者在严重程度上存在显著差异，但在他们自己的判断上保持一致。偏倚分析表明，评分者倾向于给低熟练程度的考生和评定量表的内容类别更偏倚的分数。本研究旨在证明使用GT和MFRM来评估语言表现测试中的评分判断。本研究结果对ETIC评分员的训练具有启示意义。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Studies in Language Assessment

自引率

0.00%

发文量