This study addresses a critical need in large-scale L2 writing assessment by emphasizing the significance of tailoring assessments to specific teaching and learning contexts. Focusing on the CET-4 writing test in China, the research unfolded in two phases. In Phase I, an empirically-developed analytic rating scale designed for the CET-4 writing test was rigorously validated. Twenty-one raters used this scale to rate 30 essays, and Many-Facets Rasch Model (MFRM) analysis was performed on the rating data. The outcomes demonstrate the scale’s robustness in effectively differentiating examinees’ writing performance, ensuring consistency among raters, and mitigating rater variation at both individual and group level. Phase II extends the research scope by applying the validated scale to score 142 CET-4 writing scripts. Utilizing Hierarchical and K-Means cluster analyses, this phase unveils three distinct score profiles. These findings are significant for both the CET-4 writing test and other L2 large-scale writing assessment. Theoretically, this study introduces a perspective that aims to enhance our understanding of learners’ performance in large-scale L2 writing assessment. Methodologically, this study presents a framework that integrates the validation of the rating scale with the identification of distinct score clusters, thus aiming to provide a more detailed solution for tailoring assessments to specific learning contexts.