S. Papageorgiou, Sha Wu, Ching-Ni Hsieh, Richard J. Tannenbaum, Mengmeng Cheng
The past decade has seen an emerging interest in aligning test scores to language proficiency levels of external performance scales or frameworks, such as the Common European Framework of Reference (CEFR). Such alignment is ultimately a claim about the interpretation of test scores in relation to external levels of language proficiency. To support such a claim, established procedures should be carefully implemented and documented, and multiple sources of evidence should be collected. This paper demonstrates the steps in building an argument for aligning the scores of an international English language proficiency test to the levels of China’s Standards of English Language Ability, or CSE, a localized language proficiency framework for English as a foreign language. Aligning an international examination to a localized framework serves to make the test score more relevant to the intended context of its use. We discuss the contextual issues that should be considered when interpreting test scores in relation to local proficiency levels, given the potential impact of score-based decisions on individuals and institutions. The implications for similar alignment research will also be presented.
{"title":"Aligning Language Test Scores to Local Proficiency Levels: The Case of China’s Standards of English Language Ability (CSE)","authors":"S. Papageorgiou, Sha Wu, Ching-Ni Hsieh, Richard J. Tannenbaum, Mengmeng Cheng","doi":"10.59863/ciph5850","DOIUrl":"https://doi.org/10.59863/ciph5850","url":null,"abstract":"The past decade has seen an emerging interest in aligning test scores to language proficiency levels of external performance scales or frameworks, such as the Common European Framework of Reference (CEFR). Such alignment is ultimately a claim about the interpretation of test scores in relation to external levels of language proficiency. To support such a claim, established procedures should be carefully implemented and documented, and multiple sources of evidence should be collected. This paper demonstrates the steps in building an argument for aligning the scores of an international English language proficiency test to the levels of China’s Standards of English Language Ability, or CSE, a localized language proficiency framework for English as a foreign language. Aligning an international examination to a localized framework serves to make the test score more relevant to the intended context of its use. We discuss the contextual issues that should be considered when interpreting test scores in relation to local proficiency levels, given the potential impact of score-based decisions on individuals and institutions. The implications for similar alignment research will also be presented.","PeriodicalId":72586,"journal":{"name":"Chinese/English journal of educational measurement and evaluation","volume":"62 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91248352","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Person-fit statistics (PFSs) have been suggested as a tool to detect cheating in large-scale testing, and this study investigates their potential for this application. Most PFSs are equally sensitive to scores that appear spuriously high or spuriously low. Xia & Zheng introduced four PFSs that are meant to be more sensitive to spuriously high scores and therefore may be more appropriate for detecting cheating. Comparing the power of these weighted PFSs against the power of traditional PFSs to detect cheating shows that there is no single best statistic in all or most scenarios, and in most scenarios, most examinees flagged as cheating by person fit analysis did not cheat. Implications for operational use of PFSs to detect cheating are discussed.
{"title":"Appraising Traditional and Purpose-built Person Fit Statistics’ Power to Detect Cheating","authors":"Sanford R. Student","doi":"10.59863/gypv1534","DOIUrl":"https://doi.org/10.59863/gypv1534","url":null,"abstract":"Person-fit statistics (PFSs) have been suggested as a tool to detect cheating in large-scale testing, and this study investigates their potential for this application. Most PFSs are equally sensitive to scores that appear spuriously high or spuriously low. Xia & Zheng introduced four PFSs that are meant to be more sensitive to spuriously high scores and therefore may be more appropriate for detecting cheating. Comparing the power of these weighted PFSs against the power of traditional PFSs to detect cheating shows that there is no single best statistic in all or most scenarios, and in most scenarios, most examinees flagged as cheating by person fit analysis did not cheat. Implications for operational use of PFSs to detect cheating are discussed.","PeriodicalId":72586,"journal":{"name":"Chinese/English journal of educational measurement and evaluation","volume":"12 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73640915","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Spiros Papageorgiou, Shang-cheng Wu, Ching-Ni Hsieh, Richard Tannenbaum, Mengmeng Cheng
在过去的十年中,人们对建立测试分数与外部语言能力等级量表或框架 —— 如欧洲共同语言参考框架 (Common European Framework of Reference, CEFR) —— 之间的对接关系的兴趣日益浓厚。对接本质上是一个验证假设的过程,即对考试分数与外部语言能力等级的对应关系假设进行验证。研究应认真落实和记录既定程序,并应收集多种来源的证据,以支撑关于对应关系的假设。本研究旨在建立一项国际英语能力测试与中国本地化英语能力标准——《中国英语能力等级量表》(China’s Standards of English Language Ability, CSE) —— 之间的对接关系,展示了提出并验证假设的步骤。将国际考试分数对接到本地化标准有助于使考试成绩解释与其预期使用环境的相关性更高。鉴于基于分数的决策对个人和机构的潜在影响,本研究也讨论了用本地化的能力等级来解释考试分数时应考虑的语境问题。文章结尾对同类对接研究的开展提出了一些建议。
在过去的十年中,人们对建立测试分数与外部语言能力等级量表或框架 —— 如欧洲共同语言参考框架 (Common European Framework of Reference, CEFR) —— 之间的对接关系的兴趣日益浓厚。对接本质上是一个验证假设的过程,即对考试分数与外部语言能力等级的对应关系假设进行验证。研究应认真落实和记录既定程序,并应收集多种来源的证据,以支撑关于对应关系的假设。本研究旨在建立一项国际英语能力测试与中国本地化英语能力标准——《中国英语能力等级量表》(China’s Standards of English Language Ability, CSE) —— 之间的对接关系,展示了提出并验证假设的步骤。将国际考试分数对接到本地化标准有助于使考试成绩解释与其预期使用环境的相关性更高。鉴于基于分数的决策对个人和机构的潜在影响,本研究也讨论了用本地化的能力等级来解释考试分数时应考虑的语境问题。文章结尾对同类对接研究的开展提出了一些建议。
{"title":"语言测试成绩与当地能力等级对接研究:以中国英语能力等级量表 (CSE) 为例","authors":"Spiros Papageorgiou, Shang-cheng Wu, Ching-Ni Hsieh, Richard Tannenbaum, Mengmeng Cheng","doi":"10.59863/tqes5013","DOIUrl":"https://doi.org/10.59863/tqes5013","url":null,"abstract":"在过去的十年中,人们对建立测试分数与外部语言能力等级量表或框架 —— 如欧洲共同语言参考框架 (Common European Framework of Reference, CEFR) —— 之间的对接关系的兴趣日益浓厚。对接本质上是一个验证假设的过程,即对考试分数与外部语言能力等级的对应关系假设进行验证。研究应认真落实和记录既定程序,并应收集多种来源的证据,以支撑关于对应关系的假设。本研究旨在建立一项国际英语能力测试与中国本地化英语能力标准——《中国英语能力等级量表》(China’s Standards of English Language Ability, CSE) —— 之间的对接关系,展示了提出并验证假设的步骤。将国际考试分数对接到本地化标准有助于使考试成绩解释与其预期使用环境的相关性更高。鉴于基于分数的决策对个人和机构的潜在影响,本研究也讨论了用本地化的能力等级来解释考试分数时应考虑的语境问题。文章结尾对同类对接研究的开展提出了一些建议。","PeriodicalId":72586,"journal":{"name":"Chinese/English journal of educational measurement and evaluation","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89678500","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Using Macao-PISA 2012 data collected from socio-economically disadvantaged and advantaged students, this study identified two sets of important learning factors that distinguished between low- and high-performing disadvantaged students, and between low- and high-performing advantaged students, respectively. The findings of this research contribute to a better understanding of the reasons for Macao’s high-quality and equitable education as compared to other regions with high mathematical literacy performance while also revealing the crux of small inequities in its education system. The analysis method used in this paper provides a paradigm for data mining research using large-scale assessment data and helps researchers better grasp the state of education at the local level.
{"title":"Using Data Mining to Explore Factors That Distinguish Between Students With High and Low Mathematical Literacy Performance — An Example With Socio-Economically Disadvantaged and Advantaged Students in Macao","authors":"Wa Kit Sou, K. Cheung, Man Kai Leong, Soi-kei Mak","doi":"10.59863/bmak8596","DOIUrl":"https://doi.org/10.59863/bmak8596","url":null,"abstract":"Using Macao-PISA 2012 data collected from socio-economically disadvantaged and advantaged students, this study identified two sets of important learning factors that distinguished between low- and high-performing disadvantaged students, and between low- and high-performing advantaged students, respectively. The findings of this research contribute to a better understanding of the reasons for Macao’s high-quality and equitable education as compared to other regions with high mathematical literacy performance while also revealing the crux of small inequities in its education system. The analysis method used in this paper provides a paradigm for data mining research using large-scale assessment data and helps researchers better grasp the state of education at the local level.","PeriodicalId":72586,"journal":{"name":"Chinese/English journal of educational measurement and evaluation","volume":"69 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81435727","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Christopher Barr, D. August, Lauren Artzi, Coleen D. Carlson, D. Francis
This study reports on the design and validation of a vertically equated assessment of academic vocabulary that generalizes to a meaningful corpus of words and is measured on a developmental scale: the Test of Academic Vocabulary in English (TAVE). The study builds on previous pilot work and uses a larger sample of students who are English learners (ELs) and non-EL students in grades 3 to 8 (n= 2,238) from a large urban Southwestern region, and describes the rationale and process of corpus and assessment development. A review of the findings from the study found the academic vocabulary construct to be unidimensional and to have both strong reliability and criterion validity. The TAVE was also able to discriminate performance by grade level in lower grades. For research, this study identifies a developmental metric where student scores not only generalize back to a meaningful corpus of words found in academic texts, but also offers specific expectations about which words students would know in the corpus. For practice, this study offers a tool that provides scores that are directly comparable across grades and could potentially be used to track growth across both the short and long term.
{"title":"Designing and Validating a Criterion-Referenced Assessment of High Utility Academic Vocabulary in English for Elementary and Middle School Students","authors":"Christopher Barr, D. August, Lauren Artzi, Coleen D. Carlson, D. Francis","doi":"10.59863/lzgi8844","DOIUrl":"https://doi.org/10.59863/lzgi8844","url":null,"abstract":"This study reports on the design and validation of a vertically equated assessment of academic vocabulary that generalizes to a meaningful corpus of words and is measured on a developmental scale: the Test of Academic Vocabulary in English (TAVE). The study builds on previous pilot work and uses a larger sample of students who are English learners (ELs) and non-EL students in grades 3 to 8 (n= 2,238) from a large urban Southwestern region, and describes the rationale and process of corpus and assessment development. A review of the findings from the study found the academic vocabulary construct to be unidimensional and to have both strong reliability and criterion validity. The TAVE was also able to discriminate performance by grade level in lower grades. For research, this study identifies a developmental metric where student scores not only generalize back to a meaningful corpus of words found in academic texts, but also offers specific expectations about which words students would know in the corpus. For practice, this study offers a tool that provides scores that are directly comparable across grades and could potentially be used to track growth across both the short and long term.","PeriodicalId":72586,"journal":{"name":"Chinese/English journal of educational measurement and evaluation","volume":"22 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75323194","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Christopher Barr, Diane L. August, Lauren Artzi, Coleen Carlson, David Francis
本研究开发了一项英语学术词汇测验 (the Test of Academic Vocabulary in English, TAVE),并报告了TAVE的设计和验证过程。该测验旨在衡量学生对学术词汇的掌握情况,不同年级题本之间进行了垂直等值,而且评定结果可以统一到同一个发展性的量尺 (a developmental scale) 之上。此外,测验分数可以向一个有意义的语料库 (a meaningful corpus) 中推广。本研究以先前的试点工作为基础,所用样本包含2,238名来自西南某大城市的3到8年级学生,当中既有英语学习者 (English learner, EL),也有非英语学习者,并阐述了语料库与测验的开发思路及过程。结果表明,TAVE测验呈现单维结构且具有很高的信度和关联效度,测验也能很好地区分低年级之间学生的词汇掌握能力。本文研究意义在于提出了一种发展性的度量方法 (a developmental metric),使得学生的测验分数可以推广到一个有意义的语料库中——该语料库涵盖了学术文本中的常见词汇,而且能获得学生对语料库各单词的掌握概率。本文实践意义在于提供了一个不同年级分数可比的测验工具,并能用它来追踪学生短期或长期上词汇量的变动。
本研究开发了一项英语学术词汇测验 (the Test of Academic Vocabulary in English, TAVE),并报告了TAVE的设计和验证过程。该测验旨在衡量学生对学术词汇的掌握情况,不同年级题本之间进行了垂直等值,而且评定结果可以统一到同一个发展性的量尺 (a developmental scale) 之上。此外,测验分数可以向一个有意义的语料库 (a meaningful corpus) 中推广。本研究以先前的试点工作为基础,所用样本包含2,238名来自西南某大城市的3到8年级学生,当中既有英语学习者 (English learner, EL),也有非英语学习者,并阐述了语料库与测验的开发思路及过程。结果表明,TAVE测验呈现单维结构且具有很高的信度和关联效度,测验也能很好地区分低年级之间学生的词汇掌握能力。本文研究意义在于提出了一种发展性的度量方法 (a developmental metric),使得学生的测验分数可以推广到一个有意义的语料库中——该语料库涵盖了学术文本中的常见词汇,而且能获得学生对语料库各单词的掌握概率。本文实践意义在于提供了一个不同年级分数可比的测验工具,并能用它来追踪学生短期或长期上词汇量的变动。
{"title":"中小学英语学术词汇标准参照测验的开发及验证","authors":"Christopher Barr, Diane L. August, Lauren Artzi, Coleen Carlson, David Francis","doi":"10.59863/jzmp8956","DOIUrl":"https://doi.org/10.59863/jzmp8956","url":null,"abstract":"本研究开发了一项英语学术词汇测验 (the Test of Academic Vocabulary in English, TAVE),并报告了TAVE的设计和验证过程。该测验旨在衡量学生对学术词汇的掌握情况,不同年级题本之间进行了垂直等值,而且评定结果可以统一到同一个发展性的量尺 (a developmental scale) 之上。此外,测验分数可以向一个有意义的语料库 (a meaningful corpus) 中推广。本研究以先前的试点工作为基础,所用样本包含2,238名来自西南某大城市的3到8年级学生,当中既有英语学习者 (English learner, EL),也有非英语学习者,并阐述了语料库与测验的开发思路及过程。结果表明,TAVE测验呈现单维结构且具有很高的信度和关联效度,测验也能很好地区分低年级之间学生的词汇掌握能力。本文研究意义在于提出了一种发展性的度量方法 (a developmental metric),使得学生的测验分数可以推广到一个有意义的语料库中——该语料库涵盖了学术文本中的常见词汇,而且能获得学生对语料库各单词的掌握概率。本文实践意义在于提供了一个不同年级分数可比的测验工具,并能用它来追踪学生短期或长期上词汇量的变动。","PeriodicalId":72586,"journal":{"name":"Chinese/English journal of educational measurement and evaluation","volume":"21 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75199347","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The present study used the mixed item response theory (IRT) model to identify qualitatively distinct subgroups of sixth-grade students with respect to their performance on word problems on speed. A total of 345 Singaporean students and 361 Chinese students took a problem-solving test on speed. The mixed IRT analysis revealed two latent classes — the algebra proficient group and the algebra novice group. The algebra proficient group was more likely to use traditional algebraic and arithmetic strategies to solve the problems, whereas the algebra novice group was more likely to use model drawing, unitary, and guess-and-check strategies, in addition to using traditional arithmetic and algebraic strategies. Findings of the study indicate that a greater variety of problem-solving strategies could be encouraged in upper primary schools to help students make connections among these strategies, in particular, between these strategies and the abstract algebraic strategies, and finally to achieve a successful transition from arithmetic to algebra learning.
{"title":"Analysis of a Mathematical Problem-Solving Test on Speed and Students' Strategies: A Mixed Item Response Theory Approach","authors":"Chunlian Jiang, Do-Hong Kim, Chuang Wang","doi":"10.59863/zhfp5485","DOIUrl":"https://doi.org/10.59863/zhfp5485","url":null,"abstract":"The present study used the mixed item response theory (IRT) model to identify qualitatively distinct subgroups of sixth-grade students with respect to their performance on word problems on speed. A total of 345 Singaporean students and 361 Chinese students took a problem-solving test on speed. The mixed IRT analysis revealed two latent classes — the algebra proficient group and the algebra novice group. The algebra proficient group was more likely to use traditional algebraic and arithmetic strategies to solve the problems, whereas the algebra novice group was more likely to use model drawing, unitary, and guess-and-check strategies, in addition to using traditional arithmetic and algebraic strategies. Findings of the study indicate that a greater variety of problem-solving strategies could be encouraged in upper primary schools to help students make connections among these strategies, in particular, between these strategies and the abstract algebraic strategies, and finally to achieve a successful transition from arithmetic to algebra learning.","PeriodicalId":72586,"journal":{"name":"Chinese/English journal of educational measurement and evaluation","volume":"28 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74810744","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}