Chinese/English journal of educational measurement and evaluation最新文献_第4页

区分问题解决专家与新手的关键因素：数据挖掘方法区分问题解决专家与新手的关键因素：数据挖掘方法

Chinese/English journal of educational measurement and evaluation

Pub Date : 2022-08-01 DOI: 10.59863/zoge8786

Song Jin, K. Cheung, Pou-seong Sit

数字化环境下的问题解决能力被广泛认为是21世纪的核心技能之一。许多重要因素会影响该能力表现：有些因素针对于特定的问题并与问题解决的过程相关，而其它非问题特定的因素则与问题解决者的知识、技能、态度、信念以及学习环境相关。本研究旨在分析经合组织 (OECD) 国际学生评估项目 (PISA) 2012 年研究中基于计算机的示例问题解决测验的过程数据，挖掘能将学生区分为“问题解决专家”与“问题解决新手”的关键因素。PISA 2012 的受试学生包括来自42 个经济体的11,599 名15 岁学生。研究使用了多级逻辑回归来分析问题解决过程和学生问卷数据；其次级数据分析采用了涉及分类和回归树的数据挖掘方法。研究结果表明有五个可区分专家与新手的重要因素。

引用次数: 0

考查传统和特定个人拟合指标在作弊侦测中的效力考查传统和特定个人拟合指标在作弊侦测中的效力

Chinese/English journal of educational measurement and evaluation

Pub Date : 2022-04-01 DOI: 10.59863/fdof7391

Sanford R. Student

在大规模测验中，个人拟合指标 (person-fit statistic, PFS) 可被用于检测考生作弊行为，而本研究考查了这一应用的可行性。大多数PFS对伪高分 (spuriously high scores) 和伪低分 (spuriously low scores) 同样敏感。Xia 和 Zheng (2018) 介绍了四种对伪高分更为灵敏、可能更适用于检测作弊行为的PFS。本研究将加权PFS与传统PFS在作弊侦测中的效力进行比较，结果表明在所有或大多数情况下没有单一的最佳指标，并且在大多数情况下，大多数被个人拟合分析标记为作弊的考生并没有作弊。文末讨论了使用PFS检测作弊的实际意义。

引用次数: 0

Aligning Language Test Scores to Local Proficiency Levels: The Case of China’s Standards of English Language Ability (CSE) 语言考试成绩与地方水平接轨:以中国英语语言能力标准(CSE)为例

Chinese/English journal of educational measurement and evaluation

Pub Date : 2022-04-01 DOI: 10.59863/ciph5850

S. Papageorgiou, Sha Wu, Ching-Ni Hsieh, Richard J. Tannenbaum, Mengmeng Cheng

The past decade has seen an emerging interest in aligning test scores to language proficiency levels of external performance scales or frameworks, such as the Common European Framework of Reference (CEFR). Such alignment is ultimately a claim about the interpretation of test scores in relation to external levels of language proficiency. To support such a claim, established procedures should be carefully implemented and documented, and multiple sources of evidence should be collected. This paper demonstrates the steps in building an argument for aligning the scores of an international English language proficiency test to the levels of China’s Standards of English Language Ability, or CSE, a localized language proficiency framework for English as a foreign language. Aligning an international examination to a localized framework serves to make the test score more relevant to the intended context of its use. We discuss the contextual issues that should be considered when interpreting test scores in relation to local proficiency levels, given the potential impact of score-based decisions on individuals and institutions. The implications for similar alignment research will also be presented.

在过去的十年里，人们对将考试成绩与外部表现尺度或框架(如欧洲共同参考框架(CEFR))的语言熟练程度挂钩越来越感兴趣。这种一致性最终是一种关于测试分数与语言熟练程度的外部水平之间关系的解释的主张。为了支持这样的主张，应仔细实施既定程序并形成文件，并应收集多种证据来源。本文论证了将国际英语语言能力测试的分数与中国英语语言能力标准(CSE)(英语作为外语的本地化语言能力框架)的水平保持一致的论证步骤。将国际考试与本地化框架相结合，有助于使考试成绩更符合其使用的预期背景。考虑到基于分数的决策对个人和机构的潜在影响，我们讨论了在解释与当地熟练程度相关的考试分数时应考虑的背景问题。本文还将介绍对类似对齐研究的影响。

{"title":"Aligning Language Test Scores to Local Proficiency Levels: The Case of China’s Standards of English Language Ability (CSE)","authors":"S. Papageorgiou, Sha Wu, Ching-Ni Hsieh, Richard J. Tannenbaum, Mengmeng Cheng","doi":"10.59863/ciph5850","DOIUrl":"https://doi.org/10.59863/ciph5850","url":null,"abstract":"The past decade has seen an emerging interest in aligning test scores to language proficiency levels of external performance scales or frameworks, such as the Common European Framework of Reference (CEFR). Such alignment is ultimately a claim about the interpretation of test scores in relation to external levels of language proficiency. To support such a claim, established procedures should be carefully implemented and documented, and multiple sources of evidence should be collected. This paper demonstrates the steps in building an argument for aligning the scores of an international English language proficiency test to the levels of China’s Standards of English Language Ability, or CSE, a localized language proficiency framework for English as a foreign language. Aligning an international examination to a localized framework serves to make the test score more relevant to the intended context of its use. We discuss the contextual issues that should be considered when interpreting test scores in relation to local proficiency levels, given the potential impact of score-based decisions on individuals and institutions. The implications for similar alignment research will also be presented.","PeriodicalId":72586,"journal":{"name":"Chinese/English journal of educational measurement and evaluation","volume":"62 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91248352","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Appraising Traditional and Purpose-built Person Fit Statistics’ Power to Detect Cheating 评价传统的和专用的人适合度统计对作弊的检测能力

Chinese/English journal of educational measurement and evaluation

Pub Date : 2022-04-01 DOI: 10.59863/gypv1534

Sanford R. Student

Person-fit statistics (PFSs) have been suggested as a tool to detect cheating in large-scale testing, and this study investigates their potential for this application. Most PFSs are equally sensitive to scores that appear spuriously high or spuriously low. Xia & Zheng introduced four PFSs that are meant to be more sensitive to spuriously high scores and therefore may be more appropriate for detecting cheating. Comparing the power of these weighted PFSs against the power of traditional PFSs to detect cheating shows that there is no single best statistic in all or most scenarios, and in most scenarios, most examinees flagged as cheating by person fit analysis did not cheat. Implications for operational use of PFSs to detect cheating are discussed.

个人拟合统计(pfs)已被建议作为一种工具来检测大规模测试中的作弊行为，本研究探讨了他们在这一应用中的潜力。大多数pfs对虚高或虚低的分数同样敏感。Xia和Zheng介绍了四种pfs，它们对虚假的高分更敏感，因此可能更适合检测作弊。比较这些加权pfs与传统pfs检测作弊的能力表明，在所有或大多数情况下，没有单一的最佳统计数据，在大多数情况下，大多数被人适合分析标记为作弊的考生并没有作弊。讨论了pfs用于检测作弊的操作意义。

引用次数: 0

语言测试成绩与当地能力等级对接研究：以中国英语能力等级量表 (CSE) 为例语言测试成绩与当地能力等级对接研究：以中国英语能力等级量表 (CSE) 为例

Chinese/English journal of educational measurement and evaluation

Pub Date : 2022-04-01 DOI: 10.59863/tqes5013

Spiros Papageorgiou, Shang-cheng Wu, Ching-Ni Hsieh, Richard Tannenbaum, Mengmeng Cheng

在过去的十年中，人们对建立测试分数与外部语言能力等级量表或框架 —— 如欧洲共同语言参考框架 (Common European Framework of Reference, CEFR) —— 之间的对接关系的兴趣日益浓厚。对接本质上是一个验证假设的过程，即对考试分数与外部语言能力等级的对应关系假设进行验证。研究应认真落实和记录既定程序，并应收集多种来源的证据，以支撑关于对应关系的假设。本研究旨在建立一项国际英语能力测试与中国本地化英语能力标准——《中国英语能力等级量表》(China’s Standards of English Language Ability, CSE) —— 之间的对接关系，展示了提出并验证假设的步骤。将国际考试分数对接到本地化标准有助于使考试成绩解释与其预期使用环境的相关性更高。鉴于基于分数的决策对个人和机构的潜在影响，本研究也讨论了用本地化的能力等级来解释考试分数时应考虑的语境问题。文章结尾对同类对接研究的开展提出了一些建议。

{"title":"语言测试成绩与当地能力等级对接研究：以中国英语能力等级量表 (CSE) 为例","authors":"Spiros Papageorgiou, Shang-cheng Wu, Ching-Ni Hsieh, Richard Tannenbaum, Mengmeng Cheng","doi":"10.59863/tqes5013","DOIUrl":"https://doi.org/10.59863/tqes5013","url":null,"abstract":"在过去的十年中，人们对建立测试分数与外部语言能力等级量表或框架 —— 如欧洲共同语言参考框架 (Common European Framework of Reference, CEFR) —— 之间的对接关系的兴趣日益浓厚。对接本质上是一个验证假设的过程，即对考试分数与外部语言能力等级的对应关系假设进行验证。研究应认真落实和记录既定程序，并应收集多种来源的证据，以支撑关于对应关系的假设。本研究旨在建立一项国际英语能力测试与中国本地化英语能力标准——《中国英语能力等级量表》(China’s Standards of English Language Ability, CSE) —— 之间的对接关系，展示了提出并验证假设的步骤。将国际考试分数对接到本地化标准有助于使考试成绩解释与其预期使用环境的相关性更高。鉴于基于分数的决策对个人和机构的潜在影响，本研究也讨论了用本地化的能力等级来解释考试分数时应考虑的语境问题。文章结尾对同类对接研究的开展提出了一些建议。","PeriodicalId":72586,"journal":{"name":"Chinese/English journal of educational measurement and evaluation","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89678500","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

以数据挖掘的方式探索区分高低数学素养表现学生之因素——以澳门社经弱势及优势背景学生为例以数据挖掘的方式探索区分高低数学素养表现学生之因素——以澳门社经弱势及优势背景学生为例

Chinese/English journal of educational measurement and evaluation

Pub Date : 2021-12-01 DOI: 10.59863/saaw8486

Wa Kit Sou, Kwok Cheung Cheung, Man Kai Leong, Soi-kei Mak

澳门PISA 2012报告显示，它是教育优质且公平的经济体，然而社经弱势学生要取得成功普遍较优势学生困难，教育公平的系统中可能存在不公平现象。本研究分别从澳门PISA 2012社经弱势和优势学生的评核数据，挖掘出区分弱势低表现生和弱势高表现生，以及区分优势低表现生和优势高表现生的重要学习因素。本研究的发现能够更好地了解澳门相对于其它数学素养高表现地区教育优质和公平的原因，同时也揭露了教育公平系统出现轻微不公平现象的症结所在。

引用次数: 0

Using Data Mining to Explore Factors That Distinguish Between Students With High and Low Mathematical Literacy Performance — An Example With Socio-Economically Disadvantaged and Advantaged Students in Macao 运用数据挖掘方法探讨学生数学读写能力高低的差异因素——以澳门社会经济条件较差学生与经济条件较好的学生为例

Chinese/English journal of educational measurement and evaluation

Pub Date : 2021-12-01 DOI: 10.59863/bmak8596

Wa Kit Sou, K. Cheung, Man Kai Leong, Soi-kei Mak

Using Macao-PISA 2012 data collected from socio-economically disadvantaged and advantaged students, this study identified two sets of important learning factors that distinguished between low- and high-performing disadvantaged students, and between low- and high-performing advantaged students, respectively. The findings of this research contribute to a better understanding of the reasons for Macao’s high-quality and equitable education as compared to other regions with high mathematical literacy performance while also revealing the crux of small inequities in its education system. The analysis method used in this paper provides a paradigm for data mining research using large-scale assessment data and helps researchers better grasp the state of education at the local level.

本研究利用2012澳门国际学生评估项目收集的社会经济弱势学生和优势学生的数据，分别确定了两组重要的学习因素，分别区分了表现较差的弱势学生和表现较好的优势学生，以及表现较差的优势学生和表现较好的优势学生。本研究结果有助我们了解澳门相对于其他数学素养较高的地区，为何能享有高质素及公平的教育，同时亦有助我们了解教育制度中存在的小不公平现象。本文采用的分析方法为大规模评估数据的数据挖掘研究提供了一种范式，有助于研究者更好地掌握地方教育状况。

引用次数: 1

Designing and Validating a Criterion-Referenced Assessment of High Utility Academic Vocabulary in English for Elementary and Middle School Students 中小学生英语高实用学术词汇标准参照评价的设计与验证

Chinese/English journal of educational measurement and evaluation

Pub Date : 2021-12-01 DOI: 10.59863/lzgi8844

Christopher Barr, D. August, Lauren Artzi, Coleen D. Carlson, D. Francis

This study reports on the design and validation of a vertically equated assessment of academic vocabulary that generalizes to a meaningful corpus of words and is measured on a developmental scale: the Test of Academic Vocabulary in English (TAVE). The study builds on previous pilot work and uses a larger sample of students who are English learners (ELs) and non-EL students in grades 3 to 8 (n= 2,238) from a large urban Southwestern region, and describes the rationale and process of corpus and assessment development. A review of the findings from the study found the academic vocabulary construct to be unidimensional and to have both strong reliability and criterion validity. The TAVE was also able to discriminate performance by grade level in lower grades. For research, this study identifies a developmental metric where student scores not only generalize back to a meaningful corpus of words found in academic texts, but also offers specific expectations about which words students would know in the corpus. For practice, this study offers a tool that provides scores that are directly comparable across grades and could potentially be used to track growth across both the short and long term.

本研究报告了学术词汇纵向等值评估的设计和验证，该评估推广到一个有意义的语料库，并在一个发展尺度上进行测量:英语学术词汇测试(TAVE)。本研究以先前的试点工作为基础，使用了来自西南一个大城市地区的3至8年级英语学习者和非英语学习者(n= 2238)的更大样本，并描述了语料库和评估开发的基本原理和过程。对研究结果的回顾发现，学术词汇结构具有单向度，具有较强的信度和标准效度。TAVE还能够根据年级水平区分低年级学生的表现。对于研究，本研究确定了一个发展指标，学生的分数不仅可以概括回学术文本中发现的有意义的语料库，而且还提供了学生在语料库中了解哪些单词的具体期望。在实践中，这项研究提供了一种工具，可以提供跨年级直接比较的分数，并有可能用于跟踪短期和长期的增长。

{"title":"Designing and Validating a Criterion-Referenced Assessment of High Utility Academic Vocabulary in English for Elementary and Middle School Students","authors":"Christopher Barr, D. August, Lauren Artzi, Coleen D. Carlson, D. Francis","doi":"10.59863/lzgi8844","DOIUrl":"https://doi.org/10.59863/lzgi8844","url":null,"abstract":"This study reports on the design and validation of a vertically equated assessment of academic vocabulary that generalizes to a meaningful corpus of words and is measured on a developmental scale: the Test of Academic Vocabulary in English (TAVE). The study builds on previous pilot work and uses a larger sample of students who are English learners (ELs) and non-EL students in grades 3 to 8 (n= 2,238) from a large urban Southwestern region, and describes the rationale and process of corpus and assessment development. A review of the findings from the study found the academic vocabulary construct to be unidimensional and to have both strong reliability and criterion validity. The TAVE was also able to discriminate performance by grade level in lower grades. For research, this study identifies a developmental metric where student scores not only generalize back to a meaningful corpus of words found in academic texts, but also offers specific expectations about which words students would know in the corpus. For practice, this study offers a tool that provides scores that are directly comparable across grades and could potentially be used to track growth across both the short and long term.","PeriodicalId":72586,"journal":{"name":"Chinese/English journal of educational measurement and evaluation","volume":"22 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75323194","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

中小学英语学术词汇标准参照测验的开发及验证中小学英语学术词汇标准参照测验的开发及验证

Chinese/English journal of educational measurement and evaluation

Pub Date : 2021-12-01 DOI: 10.59863/jzmp8956

Christopher Barr, Diane L. August, Lauren Artzi, Coleen Carlson, David Francis

本研究开发了一项英语学术词汇测验 (the Test of Academic Vocabulary in English, TAVE)，并报告了TAVE的设计和验证过程。该测验旨在衡量学生对学术词汇的掌握情况，不同年级题本之间进行了垂直等值，而且评定结果可以统一到同一个发展性的量尺 (a developmental scale) 之上。此外，测验分数可以向一个有意义的语料库 (a meaningful corpus) 中推广。本研究以先前的试点工作为基础，所用样本包含2,238名来自西南某大城市的3到8年级学生，当中既有英语学习者 (English learner, EL)，也有非英语学习者，并阐述了语料库与测验的开发思路及过程。结果表明，TAVE测验呈现单维结构且具有很高的信度和关联效度，测验也能很好地区分低年级之间学生的词汇掌握能力。本文研究意义在于提出了一种发展性的度量方法 (a developmental metric)，使得学生的测验分数可以推广到一个有意义的语料库中——该语料库涵盖了学术文本中的常见词汇，而且能获得学生对语料库各单词的掌握概率。本文实践意义在于提供了一个不同年级分数可比的测验工具，并能用它来追踪学生短期或长期上词汇量的变动。

{"title":"中小学英语学术词汇标准参照测验的开发及验证","authors":"Christopher Barr, Diane L. August, Lauren Artzi, Coleen Carlson, David Francis","doi":"10.59863/jzmp8956","DOIUrl":"https://doi.org/10.59863/jzmp8956","url":null,"abstract":"本研究开发了一项英语学术词汇测验 (the Test of Academic Vocabulary in English, TAVE)，并报告了TAVE的设计和验证过程。该测验旨在衡量学生对学术词汇的掌握情况，不同年级题本之间进行了垂直等值，而且评定结果可以统一到同一个发展性的量尺 (a developmental scale) 之上。此外，测验分数可以向一个有意义的语料库 (a meaningful corpus) 中推广。本研究以先前的试点工作为基础，所用样本包含2,238名来自西南某大城市的3到8年级学生，当中既有英语学习者 (English learner, EL)，也有非英语学习者，并阐述了语料库与测验的开发思路及过程。结果表明，TAVE测验呈现单维结构且具有很高的信度和关联效度，测验也能很好地区分低年级之间学生的词汇掌握能力。本文研究意义在于提出了一种发展性的度量方法 (a developmental metric)，使得学生的测验分数可以推广到一个有意义的语料库中——该语料库涵盖了学术文本中的常见词汇，而且能获得学生对语料库各单词的掌握概率。本文实践意义在于提供了一个不同年级分数可比的测验工具，并能用它来追踪学生短期或长期上词汇量的变动。","PeriodicalId":72586,"journal":{"name":"Chinese/English journal of educational measurement and evaluation","volume":"21 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75199347","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Analysis of a Mathematical Problem-Solving Test on Speed and Students' Strategies: A Mixed Item Response Theory Approach 数学解题速度与学生策略的分析:一个混合项目反应理论方法

Chinese/English journal of educational measurement and evaluation

Pub Date : 2021-10-01 DOI: 10.59863/zhfp5485

Chunlian Jiang, Do-Hong Kim, Chuang Wang

The present study used the mixed item response theory (IRT) model to identify qualitatively distinct subgroups of sixth-grade students with respect to their performance on word problems on speed. A total of 345 Singaporean students and 361 Chinese students took a problem-solving test on speed. The mixed IRT analysis revealed two latent classes — the algebra proficient group and the algebra novice group. The algebra proficient group was more likely to use traditional algebraic and arithmetic strategies to solve the problems, whereas the algebra novice group was more likely to use model drawing, unitary, and guess-and-check strategies, in addition to using traditional arithmetic and algebraic strategies. Findings of the study indicate that a greater variety of problem-solving strategies could be encouraged in upper primary schools to help students make connections among these strategies, in particular, between these strategies and the abstract algebraic strategies, and finally to achieve a successful transition from arithmetic to algebra learning.

本研究采用混合项目反应理论(IRT)模型来识别六年级学生在语速问题上的表现。共有345名新加坡学生和361名中国学生参加了速度问题测试。混合IRT分析显示两个潜在类别-代数精通组和代数新手组。代数精通组更倾向于使用传统的代数和算术策略来解决问题，而代数新手组在使用传统的算术和代数策略的同时，更倾向于使用模型绘制、统一和猜测和验证策略。研究结果表明，可以在小学高年级鼓励更多种类的问题解决策略，帮助学生在这些策略之间建立联系，特别是在这些策略与抽象代数策略之间建立联系，最终实现从算术学习到代数学习的成功过渡。

引用次数: 0