{"title":"Comparing two formats of data-driven rating scales for classroom assessment of pragmatic performance with roleplays","authors":"Yunwen Su, Sun-Young Shin","doi":"10.1177/02655322231210217","DOIUrl":null,"url":null,"abstract":"Rating scales that language testers design should be tailored to the specific test purpose and score use as well as reflect the target construct. Researchers have long argued for the value of data-driven scales for classroom performance assessment, because they are specific to pedagogical tasks and objectives, have rich descriptors to offer useful diagnostic information, and exhibit robust content representativeness and stable measurement properties. This sequential mixed methods study compares two data-driven rating scales with multiple criteria that use different formats for pragmatic performance. They were developed using roleplays performed by 43 second-language learners of Mandarin—the hierarchical-binary (HB) scale, developed through close analysis of performance data, and the multi-trait (MT) scale derived from the HB, which has the same criteria but takes the format of an analytic scale. Results revealed the influence of format, albeit to a limited extent: MT showed a marginal advantage over HB in terms of overall reliability, practicality, and discriminatory power, though measurement properties of the two scales were largely comparable. All raters were positive about the pedagogical value of both scales. This study reveals that rater perceptions of the ease of use and effectiveness of both scales provide further insights into scale functioning.","PeriodicalId":17928,"journal":{"name":"Language Testing","volume":"52 1","pages":""},"PeriodicalIF":2.2000,"publicationDate":"2023-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Language Testing","FirstCategoryId":"98","ListUrlMain":"https://doi.org/10.1177/02655322231210217","RegionNum":1,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"0","JCRName":"LANGUAGE & LINGUISTICS","Score":null,"Total":0}
引用次数: 0
Abstract
Rating scales that language testers design should be tailored to the specific test purpose and score use as well as reflect the target construct. Researchers have long argued for the value of data-driven scales for classroom performance assessment, because they are specific to pedagogical tasks and objectives, have rich descriptors to offer useful diagnostic information, and exhibit robust content representativeness and stable measurement properties. This sequential mixed methods study compares two data-driven rating scales with multiple criteria that use different formats for pragmatic performance. They were developed using roleplays performed by 43 second-language learners of Mandarin—the hierarchical-binary (HB) scale, developed through close analysis of performance data, and the multi-trait (MT) scale derived from the HB, which has the same criteria but takes the format of an analytic scale. Results revealed the influence of format, albeit to a limited extent: MT showed a marginal advantage over HB in terms of overall reliability, practicality, and discriminatory power, though measurement properties of the two scales were largely comparable. All raters were positive about the pedagogical value of both scales. This study reveals that rater perceptions of the ease of use and effectiveness of both scales provide further insights into scale functioning.
期刊介绍:
Language Testing is a fully peer reviewed international journal that publishes original research and review articles on language testing and assessment. It provides a forum for the exchange of ideas and information between people working in the fields of first and second language testing and assessment. This includes researchers and practitioners in EFL and ESL testing, and assessment in child language acquisition and language pathology. In addition, special attention is focused on issues of testing theory, experimental investigations, and the following up of practical implications.