高分口语测评中测评者严重程度和一致性随时间变化的共性因素

IF 2.4 1区文学 0 LANGUAGE & LINGUISTICS Language Testing Pub Date : 2024-04-10 DOI:10.1177/02655322241239363

Reeta Neittaanmäki, Iasonas Lamprianou

{"title":"高分口语测评中测评者严重程度和一致性随时间变化的共性因素","authors":"Reeta Neittaanmäki, Iasonas Lamprianou","doi":"10.1177/02655322241239363","DOIUrl":null,"url":null,"abstract":"This article focuses on rater severity and consistency and their relation to major changes in the rating system in a high-stakes testing context. The study is based on longitudinal data collected from 2009 to 2019 from the second language (L2) Finnishspeaking subtest in the National Certificates of Language Proficiency in Finland. We investigated whether rater severity and consistency changed over that period and whether the changes could be explained by major changes in the rating system, such as the change of lead examiner, the modus of rating and training (on-site or remote), and the composition of the rater group. The data consisted of 45 rating sessions with 104 raters and 59,899 examinees and were analysed using the Many-Facets Rasch model and generalized linear mixed models. The analyses indicated that raters as a group became somewhat more lenient over time. In addition, the results showed that the rater community and its practices, the lead examiners, and the modus of rating and training can influence the rating behaviour. Finally, we elaborate on implications for both research and practice.","PeriodicalId":17928,"journal":{"name":"Language Testing","volume":"49 1","pages":""},"PeriodicalIF":2.4000,"publicationDate":"2024-04-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Communal factors in rater severity and consistency over time in high-stakes oral assessment\",\"authors\":\"Reeta Neittaanmäki, Iasonas Lamprianou\",\"doi\":\"10.1177/02655322241239363\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This article focuses on rater severity and consistency and their relation to major changes in the rating system in a high-stakes testing context. The study is based on longitudinal data collected from 2009 to 2019 from the second language (L2) Finnishspeaking subtest in the National Certificates of Language Proficiency in Finland. We investigated whether rater severity and consistency changed over that period and whether the changes could be explained by major changes in the rating system, such as the change of lead examiner, the modus of rating and training (on-site or remote), and the composition of the rater group. The data consisted of 45 rating sessions with 104 raters and 59,899 examinees and were analysed using the Many-Facets Rasch model and generalized linear mixed models. The analyses indicated that raters as a group became somewhat more lenient over time. In addition, the results showed that the rater community and its practices, the lead examiners, and the modus of rating and training can influence the rating behaviour. Finally, we elaborate on implications for both research and practice.\",\"PeriodicalId\":17928,\"journal\":{\"name\":\"Language Testing\",\"volume\":\"49 1\",\"pages\":\"\"},\"PeriodicalIF\":2.4000,\"publicationDate\":\"2024-04-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Language Testing\",\"FirstCategoryId\":\"98\",\"ListUrlMain\":\"https://doi.org/10.1177/02655322241239363\",\"RegionNum\":1,\"RegionCategory\":\"文学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"0\",\"JCRName\":\"LANGUAGE & LINGUISTICS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Language Testing","FirstCategoryId":"98","ListUrlMain":"https://doi.org/10.1177/02655322241239363","RegionNum":1,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"0","JCRName":"LANGUAGE & LINGUISTICS","Score":null,"Total":0}

引用次数: 0

摘要

本文重点关注评分者的严重性和一致性，以及它们与高风险测试背景下评分系统重大变化的关系。研究基于 2009 年至 2019 年期间从芬兰国家语言能力证书第二语言（L2）芬兰语子测试中收集的纵向数据。我们调查了在此期间评分者的严重程度和一致性是否发生了变化，以及这些变化是否可以用评分系统的重大变化来解释，例如主考官的更换、评分和培训方式（现场或远程）以及评分者群体的构成。数据包括 104 名评分员和 59 899 名受试者的 45 次评分，并使用多面 Rasch 模型和广义线性混合模型进行了分析。分析表明，随着时间的推移，评分者作为一个群体变得更加宽松。此外，结果表明，评分者群体及其做法、主考官以及评分和培训方式都会影响评分行为。最后，我们阐述了对研究和实践的影响。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Communal factors in rater severity and consistency over time in high-stakes oral assessment

This article focuses on rater severity and consistency and their relation to major changes in the rating system in a high-stakes testing context. The study is based on longitudinal data collected from 2009 to 2019 from the second language (L2) Finnishspeaking subtest in the National Certificates of Language Proficiency in Finland. We investigated whether rater severity and consistency changed over that period and whether the changes could be explained by major changes in the rating system, such as the change of lead examiner, the modus of rating and training (on-site or remote), and the composition of the rater group. The data consisted of 45 rating sessions with 104 raters and 59,899 examinees and were analysed using the Many-Facets Rasch model and generalized linear mixed models. The analyses indicated that raters as a group became somewhat more lenient over time. In addition, the results showed that the rater community and its practices, the lead examiners, and the modus of rating and training can influence the rating behaviour. Finally, we elaborate on implications for both research and practice.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Language Testing Multiple-

CiteScore

6.70

自引率

9.80%

发文量

期刊介绍： Language Testing is a fully peer reviewed international journal that publishes original research and review articles on language testing and assessment. It provides a forum for the exchange of ideas and information between people working in the fields of first and second language testing and assessment. This includes researchers and practitioners in EFL and ESL testing, and assessment in child language acquisition and language pathology. In addition, special attention is focused on issues of testing theory, experimental investigations, and the following up of practical implications.