Comparative judgement (CJ) is a method of assessment in which judges perform paired comparisons of pieces of student work and decide which one is “better”. CJ has many potential benefits for the writing assessment community, including its reliability, flexibility, and efficiency. However, by reviewing the literature on CJ’s application to L2 writing assessment, we find that while existing studies have established the plausibility of using CJ in this context, they provide little indication of the conditions under which the method is most likely to prove useful. In particular, by focusing on the assessment of relatively short texts, covering a wide proficiency range, and using a single essay prompt, they leave unresolved the question of how such textual factors affect CJ’s reliability and validity. To address this, we conduct two studies exploring the reliability and validity of a community-driven form of CJ for evaluating L2 texts which were longer, featured a narrower proficiency range, and were more topically diverse than earlier studies. Our results suggest that CJ remains reliable under these conditions. In addition, comparison with rubric-based assessment using CEFR scales suggests that the CJ approach also has an acceptable level of validity.