Pub Date : 2019-04-24DOI: 10.4324/9780203709740-10
Shuai Li, Yali Feng, Ting-Sheng Wen
It is a wide practice that Chinese language instructors develop their own instruments for classroom assessment and make important pedagogical decisions (e.g., assigning grades) accordingly. However, the quality of such instruments has rarely been discussed in the literature. This chapter focuses on the measurement quality of an instructor-developed test used as a final written exam in an undergraduate Chinese language course in the U.S. The test was designed to assess the linguistic knowledge taught in the course and contained 37 binary-scored (0/1) items and 17 constructed-response items. Two four-category rating scales were developed to evaluate the constructed responses. Examinees were 88 students enrolled in the Chinese course. Results showed acceptable overall measurement quality of the test as indicated by measures of difficulty, discrimination, reliability, and Rasch model fit. The two rating scales, however, were found to include excessive score categories, suggesting measurement redundancy. The findings of this study are intended to raise the awareness among CSL instructors of the potential limitations of their self-developed assessment instruments.
{"title":"Measurement Quality and Rating Scale Functioning of a CSL Classroom Assessment Instrument","authors":"Shuai Li, Yali Feng, Ting-Sheng Wen","doi":"10.4324/9780203709740-10","DOIUrl":"https://doi.org/10.4324/9780203709740-10","url":null,"abstract":"It is a wide practice that Chinese language instructors develop their own instruments for classroom assessment and make important pedagogical decisions (e.g., assigning grades) accordingly. However, the quality of such instruments has rarely been discussed in the literature. This chapter focuses on the measurement quality of an instructor-developed test used as a final written exam in an undergraduate Chinese language course in the U.S. The test was designed to assess the linguistic knowledge taught in the course and contained 37 binary-scored (0/1) items and 17 constructed-response items. Two four-category rating scales were developed to evaluate the constructed responses. Examinees were 88 students enrolled in the Chinese course. Results showed acceptable overall measurement quality of the test as indicated by measures of difficulty, discrimination, reliability, and Rasch model fit. The two rating scales, however, were found to include excessive score categories, suggesting measurement redundancy. The findings of this study are intended to raise the awareness among CSL instructors of the potential limitations of their self-developed assessment instruments.","PeriodicalId":62305,"journal":{"name":"对外汉语研究","volume":"53 1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2019-04-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82687703","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}