Evaluating rater judgments on ETIC Advanced writing tasks: An application of generalizability theory and Many-Facet

IF 0.1 Q4 LINGUISTICS Studies in Language Assessment Pub Date : 2019-01-01 DOI:10.58379/vmak1620
Jiayu Wang, Kaizhou Luo
{"title":"Evaluating rater judgments on ETIC Advanced writing tasks: An application of generalizability theory and Many-Facet","authors":"Jiayu Wang, Kaizhou Luo","doi":"10.58379/vmak1620","DOIUrl":null,"url":null,"abstract":"Developed by China Language Assessment (CLA), the English Test for International Communication Advanced (ETIC Advanced) assesses one’s ability to perform English language tasks in international workplace contexts. ETIC Advanced is only composed of writing and speaking tasks, featured with authentic constructed response format. However, the elicitation of extended responses from candidates would call for human raters to make judgments, thus raising a critical issue of rating quality. This study aimed to evaluate rater judgements on the writing tasks of ETIC Advanced. Data in the study represented scores from 186 candidates who performed all writing tasks: Letter Writing, Report Writing, and Proposal Writing (n=3,348 ratings). Rating was conducted by six certified raters based on a six-point three-category analytical rating scale. Generalizability theory (GT) and Many-Facets Rasch Model (MFRM) were applied to analyse the scores from different perspectives. Results from GT indicated that raters’ inconsistency and interaction with other aspects resulted in a relatively low proportion of overall score variance, and that the ratings sufficed for generalization. MFRM analysis revealed that the six raters differed significantly in severity, yet remained consistent in their own judgements. Bias analyses indicated that the raters tended to assign more biased scores to low-proficient candidates and the Content category of rating scale. The study serves to demonstrate the use of both GT and MFRM to evaluate rater judgments on language performance tests. The findings of this study have implications for ETIC rater training.","PeriodicalId":29650,"journal":{"name":"Studies in Language Assessment","volume":"30 1","pages":""},"PeriodicalIF":0.1000,"publicationDate":"2019-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Studies in Language Assessment","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.58379/vmak1620","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"LINGUISTICS","Score":null,"Total":0}
引用次数: 4

Abstract

Developed by China Language Assessment (CLA), the English Test for International Communication Advanced (ETIC Advanced) assesses one’s ability to perform English language tasks in international workplace contexts. ETIC Advanced is only composed of writing and speaking tasks, featured with authentic constructed response format. However, the elicitation of extended responses from candidates would call for human raters to make judgments, thus raising a critical issue of rating quality. This study aimed to evaluate rater judgements on the writing tasks of ETIC Advanced. Data in the study represented scores from 186 candidates who performed all writing tasks: Letter Writing, Report Writing, and Proposal Writing (n=3,348 ratings). Rating was conducted by six certified raters based on a six-point three-category analytical rating scale. Generalizability theory (GT) and Many-Facets Rasch Model (MFRM) were applied to analyse the scores from different perspectives. Results from GT indicated that raters’ inconsistency and interaction with other aspects resulted in a relatively low proportion of overall score variance, and that the ratings sufficed for generalization. MFRM analysis revealed that the six raters differed significantly in severity, yet remained consistent in their own judgements. Bias analyses indicated that the raters tended to assign more biased scores to low-proficient candidates and the Content category of rating scale. The study serves to demonstrate the use of both GT and MFRM to evaluate rater judgments on language performance tests. The findings of this study have implications for ETIC rater training.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
评量ETIC进阶写作任务的评量判断:概括性理论与多面向的应用
由中国语言评估中心(CLA)开发的国际交流高级英语测试(ETIC高级)评估一个人在国际工作环境中执行英语语言任务的能力。ETIC高级只由写作和口语任务组成,具有真实的构建响应格式。然而,要从候选人那里得到更多的回答,就需要人类评级员做出判断,从而提出了评级质量的关键问题。本研究旨在评估评分者对ETIC高级写作任务的判断。研究中的数据代表了186名候选人的得分,他们完成了所有的写作任务:写信、写报告和写提案(n= 3348个评分)。评分由六名认证评分员根据六分三类分析评分量表进行。应用概化理论(Generalizability theory, GT)和多面拉希模型(many - facet Rasch Model, MFRM)从不同角度分析得分。GT结果表明,评分者的不一致性和与其他方面的相互作用导致总分方差比例相对较低,评分足以泛化。MFRM分析显示,六个评分者在严重程度上存在显著差异,但在他们自己的判断上保持一致。偏倚分析表明,评分者倾向于给低熟练程度的考生和评定量表的内容类别更偏倚的分数。本研究旨在证明使用GT和MFRM来评估语言表现测试中的评分判断。本研究结果对ETIC评分员的训练具有启示意义。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Contextual variables in written assessment feedback in a university-level Spanish program The effect of in-class and one-on-one video feedback on EFL learners’ English public speaking competency and anxiety Gebril, A. (Ed.) Learning-Oriented Language Assessment: Putting Theory into Practice. Is the devil you know better? Testwiseness and eliciting evidence of interactional competence in familiar versus unfamiliar triadic speaking tasks The meaningfulness of two curriculum-based national tests of English as a foreign language
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1