Is CJ a valid, reliable form of L2 writing assessment when texts are long, homogeneous in proficiency, and feature heterogeneous prompts?

IF 4.2 1区文学 Q1 EDUCATION & EDUCATIONAL RESEARCH Assessing Writing Pub Date : 2024-04-01 DOI:10.1016/j.asw.2024.100843

Peter Thwaites , Charalambos Kollias , Magali Paquot

{"title":"Is CJ a valid, reliable form of L2 writing assessment when texts are long, homogeneous in proficiency, and feature heterogeneous prompts?","authors":"Peter Thwaites , Charalambos Kollias , Magali Paquot","doi":"10.1016/j.asw.2024.100843","DOIUrl":null,"url":null,"abstract":"<div><p>Comparative judgement (CJ) is a method of assessment in which judges perform paired comparisons of pieces of student work and decide which one is “better”. CJ has many potential benefits for the writing assessment community, including its reliability, flexibility, and efficiency. However, by reviewing the literature on CJ’s application to L2 writing assessment, we find that while existing studies have established the plausibility of using CJ in this context, they provide little indication of the conditions under which the method is most likely to prove useful. In particular, by focusing on the assessment of relatively short texts, covering a wide proficiency range, and using a single essay prompt, they leave unresolved the question of how such textual factors affect CJ’s reliability and validity. To address this, we conduct two studies exploring the reliability and validity of a community-driven form of CJ for evaluating L2 texts which were longer, featured a narrower proficiency range, and were more topically diverse than earlier studies. Our results suggest that CJ remains reliable under these conditions. In addition, comparison with rubric-based assessment using CEFR scales suggests that the CJ approach also has an acceptable level of validity.</p></div>","PeriodicalId":46865,"journal":{"name":"Assessing Writing","volume":"60 ","pages":"Article 100843"},"PeriodicalIF":4.2000,"publicationDate":"2024-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Assessing Writing","FirstCategoryId":"98","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1075293524000369","RegionNum":1,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"EDUCATION & EDUCATIONAL RESEARCH","Score":null,"Total":0}

引用次数: 0

Abstract

Comparative judgement (CJ) is a method of assessment in which judges perform paired comparisons of pieces of student work and decide which one is “better”. CJ has many potential benefits for the writing assessment community, including its reliability, flexibility, and efficiency. However, by reviewing the literature on CJ’s application to L2 writing assessment, we find that while existing studies have established the plausibility of using CJ in this context, they provide little indication of the conditions under which the method is most likely to prove useful. In particular, by focusing on the assessment of relatively short texts, covering a wide proficiency range, and using a single essay prompt, they leave unresolved the question of how such textual factors affect CJ’s reliability and validity. To address this, we conduct two studies exploring the reliability and validity of a community-driven form of CJ for evaluating L2 texts which were longer, featured a narrower proficiency range, and were more topically diverse than earlier studies. Our results suggest that CJ remains reliable under these conditions. In addition, comparison with rubric-based assessment using CEFR scales suggests that the CJ approach also has an acceptable level of validity.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

当课文篇幅较长、同质化程度较高、提示语不尽相同时，CJ 是否是一种有效、可靠的 L2 写作评估形式？

比较评判（CJ）是一种评判方法，评判者将学生的作品进行配对比较，然后决定哪一个 "更好"。CJ 对写作评估界有许多潜在的好处，包括它的可靠性、灵活性和效率。然而，通过回顾有关将 CJ 应用于 L2 写作评估的文献，我们发现，虽然现有的研究已经证实了在这种情况下使用 CJ 的合理性，但对于在什么条件下该方法最有可能被证明是有用的，这些研究却没有提供什么说明。特别是，这些研究侧重于评估相对较短的文章，涵盖了广泛的能力范围，并使用了单一的作文提示，因此，这些文章因素如何影响 CJ 的可靠性和有效性的问题尚未解决。为了解决这个问题，我们进行了两项研究，探索社区驱动形式的 CJ 在评估 L2 课文方面的信度和效度，与之前的研究相比，这些课文篇幅更长、能力范围更窄、主题更多样。我们的研究结果表明，在这些条件下，CJ 仍然是可靠的。此外，与使用 CEFR 量表的基于评分标准的评估相比，CJ 方法的有效性也达到了可接受的水平。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Assessing Writing Multiple-

CiteScore

6.00

自引率

17.90%

发文量

期刊介绍： Assessing Writing is a refereed international journal providing a forum for ideas, research and practice on the assessment of written language. Assessing Writing publishes articles, book reviews, conference reports, and academic exchanges concerning writing assessments of all kinds, including traditional (direct and standardised forms of) testing of writing, alternative performance assessments (such as portfolios), workplace sampling and classroom assessment. The journal focuses on all stages of the writing assessment process, including needs evaluation, assessment creation, implementation, and validation, and test development.