Chinese/English journal of educational measurement and evaluation最新文献_第6页

A Latent Class IRT Approach to Defining and Measuring Language Proficiency. 定义和测量语言能力的潜在类别IRT方法。

Chinese/English journal of educational measurement and evaluation

Pub Date : 2021-03-01 DOI: 10.59863/ycua8620

Tammy D Tolar, D. Francis, Paulina A. Kulesz, K. Stuebing

English language learner (EL) status has high stakes implications for determining when and how ELs should be evaluated for academic achievement. In the US, students designated as English learners are assessed annually for English language proficiency (ELP), a complex construct whose conceptualization has evolved in recent years to reflect more precisely the language demands of content area achievement as reflected in the standards of individual states and state language assessment consortia, such as WIDA and ELPA21. The goal of this paper was to examine the possible role for and utility of using content area assessments to validate language proficiency mastery criteria. Specifically, we applied mixture item response models to identify two classes of EL students: (1) ELs for whom English language arts and math achievement test items have similar difficulty and discrimination parameters as they do for non-ELs and (2) ELs for whom the test items function differently. We used latent class IRT methods to identify the two groups of ELs and to evaluate the effects of different subscales of ELP (reading, writing, listening, and speaking) on group membership. Only reading and writing were significant predictors of class membership. Cut-scores based on summary scores of ELP were imperfect predictors of class membership and indicated the need for finer differentiation within the top proficiency category. This study demonstrates the importance of linking definitions of ELP to the context for which ELP is used and suggests the possible value of psychometric analyses when language proficiency standards are linked to the language requirements for content area achievement.

英语学习者(EL)的状态对于决定何时以及如何评估EL的学业成就具有重大意义。在美国，被指定为英语学习者的学生每年都要接受英语语言能力(ELP)评估，这是一个复杂的结构，其概念近年来不断发展，以更准确地反映各州和各州语言评估联盟(如WIDA和ELPA21)的标准中所反映的内容领域成就的语言要求。本文的目的是检查使用内容区域评估来验证语言熟练掌握标准的可能作用和效用。具体来说，我们应用混合项目反应模型来识别两类英语学生:(1)英语语言艺术和数学成就测试项目的难度和辨别参数与非英语学生相似的英语学生;(2)测试项目的功能不同的英语学生。我们使用潜在类别IRT方法来识别两组ELP，并评估ELP不同分量表(阅读、写作、听力和口语)对群体成员的影响。只有阅读和写作是班级成员的重要预测因素。基于ELP总结分数的Cut-scores是不完美的班级成员预测，并表明需要在最高熟练程度类别中进行更精细的区分。本研究证明了将语言能力水平定义与使用语言能力水平的语境联系起来的重要性，并表明当语言能力标准与内容领域成就的语言要求联系起来时，心理测量分析的可能价值。

{"title":"A Latent Class IRT Approach to Defining and Measuring Language Proficiency.","authors":"Tammy D Tolar, D. Francis, Paulina A. Kulesz, K. Stuebing","doi":"10.59863/ycua8620","DOIUrl":"https://doi.org/10.59863/ycua8620","url":null,"abstract":"English language learner (EL) status has high stakes implications for determining when and how ELs should be evaluated for academic achievement. In the US, students designated as English learners are assessed annually for English language proficiency (ELP), a complex construct whose conceptualization has evolved in recent years to reflect more precisely the language demands of content area achievement as reflected in the standards of individual states and state language assessment consortia, such as WIDA and ELPA21. The goal of this paper was to examine the possible role for and utility of using content area assessments to validate language proficiency mastery criteria. Specifically, we applied mixture item response models to identify two classes of EL students: (1) ELs for whom English language arts and math achievement test items have similar difficulty and discrimination parameters as they do for non-ELs and (2) ELs for whom the test items function differently. We used latent class IRT methods to identify the two groups of ELs and to evaluate the effects of different subscales of ELP (reading, writing, listening, and speaking) on group membership. Only reading and writing were significant predictors of class membership. Cut-scores based on summary scores of ELP were imperfect predictors of class membership and indicated the need for finer differentiation within the top proficiency category. This study demonstrates the importance of linking definitions of ELP to the context for which ELP is used and suggests the possible value of psychometric analyses when language proficiency standards are linked to the language requirements for content area achievement.","PeriodicalId":72586,"journal":{"name":"Chinese/English journal of educational measurement and evaluation","volume":"29 1","pages":"49-73"},"PeriodicalIF":0.0,"publicationDate":"2021-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81219508","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Application Innovation of Educational Measurement Theory, Method, and Technology in China’s New College Entrance Examination Reform 教育测量理论、方法和技术在新高考改革中的应用创新

Chinese/English journal of educational measurement and evaluation

Pub Date : 2021-03-01 DOI: 10.59863/cbjl1170

Zhengyan Liang, Minqiang Zhang, Feifei Huang, Derong Kang, Lingling Xu

China’s new college entrance examination (the new gaokao) reform provides an opportunity for researchers and practitioners of educational measurement to directly participate in the reform. Therefore, conducting in-depth research on the characteristics of the new gaokao and the issues it faces, and finding corresponding solutions theoretically, methodologically, and technically will not only help to deploy the education examination reform smoothly, but also expand and enrich the research and application of educational measurement. This article provides discussions and suggestions on some issues related to the new gaokao reform, including the stability issue of the examination brought by the scoring methods and subject selection, the equating issue of test scores due to biannual tests or cross-year comparisons, and the issue of giving feedback to basic education based on the analysis of the gaokao data.

中国的新高考改革为教育计量的研究者和实践者提供了直接参与改革的机会。因此，深入研究新高考的特点及其面临的问题，并在理论、方法和技术上找到相应的解决方案，不仅有利于教育考试改革的顺利部署，也有助于拓展和丰富教育测量的研究与应用。本文就新高考改革中涉及的一些问题，包括计分方式和学科选择带来的考试稳定性问题、两年一次或跨年比较导致的考试成绩均等化问题、基于高考数据分析对基础教育的反馈问题等，提出了讨论和建议。

引用次数: 0

中国新高考改革中的教育测量理论、方法和技术的应用创新中国新高考改革中的教育测量理论、方法和技术的应用创新

Chinese/English journal of educational measurement and evaluation

Pub Date : 2021-03-01 DOI: 10.59863/uhwn1941

正妍梁, 敏强张, 菲菲黄, 德荣康, 玲玲徐

中国新高考改革为教育测量学的研究者和应用者提供了一个直接参与中国高考改革的契机。因而，对新高考的特点及其所面临的问题进行深入研究，并从理论、方法或技术上给出有针对性的解决方案，不仅利于中国新高考改革的顺利开展，还将拓展和丰富教育测量学的研究和应用。本文针对中国新高考改革中由计分方式和自主选科所带来的考试的稳定性问题、不同年度和“一年两考”分数等值问题，以及如何基于高考数据分析对基础教育教学进行反馈的问题，给出了研讨性的研究成果并提出了相应的建议。

引用次数: 0

中国心理和教育测量发展中国心理和教育测量发展

Chinese/English journal of educational measurement and evaluation

Pub Date : 2020-12-01 DOI: 10.59863/hoyb3108

厚张, 方骆

【本文无摘要。】

引用次数: 0

The Development of Psychological and Educational Measurement in China 心理教育测量在中国的发展

Chinese/English journal of educational measurement and evaluation

Pub Date : 2020-12-01 DOI: 10.59863/buai8988

Houcan Zhang, Fang Luo

[No abstract.]

(没有抽象的。)

引用次数: 1

CEJEME创刊号前言 CEJEME创刊号前言

Chinese/English journal of educational measurement and evaluation

Pub Date : 2020-12-01 DOI: 10.59863/rjbo6659

Cai Li, Tao Xin

【本文无摘要。】

引用次数: 0

De-“Constructing” Test Validation 去“构造”测试验证

Chinese/English journal of educational measurement and evaluation

Pub Date : 2020-12-01 DOI: 10.59863/ckhh8837

S. Sireci

Construct validity theory presents the most comprehensive description of “validity” as it pertains to educational and psychological testing. The term “construct validity” was introduced in 1954 in the Technical Recommendations for Psychological Tests and Diagnostic Techniques (American Psychological Association [APA], 1954), and subsequently elucidated by two members of the 1954 committee — Cronbach and Meehl (1955). Construct validity theory has had enormous impact on the theoretical descriptions of validity, but it was not explicitly supported by the last two versions of the Standards for Educational and Psychological Testing (American Educational Research Association [AERA] et al., 1999, 2014). In this article I trace some of the history of the debate regarding the importance of construct validity theory for test val- idation, identify the essential elements of construct validity theory that are critical for validating the use of a test for a particular purpose, and propose a framework for test validation that focuses on test use, rather than test construct. This “de-constructed” approach involves four steps: (a) clearly articulating testing purposes, (b) identifying potential negative consequences of test use, (c) crossing test purposes and potential misuses with the five sources of validity evidence listed in the AERA et al. (2014) Standards for Educational and Psychological Testing, and (d) prioritizing the sources of validity evidence needed to build a sound validity argument that focuses on test use and consequences. The goals of deconstructed validation are to embrace the major tenets involved in construct validity theory by using them to develop a coherent and comprehensive validity argument that is comprehensible to psychometricians, court justices, policy makers, and the general public; and is consistent with the AERA et al. (2014) Standards.

建构效度理论是对教育和心理测试中“效度”最全面的描述。“建构效度”一词于1954年在《心理测试和诊断技术技术建议》(美国心理协会[APA]， 1954)中被引入，随后由1954年委员会的两位成员——Cronbach和Meehl(1955)加以阐明。构念效度理论对效度的理论描述产生了巨大的影响，但最近两版的《教育与心理测试标准》并未明确支持构念效度理论(美国教育研究协会[AERA] et al.， 1999,2014)。在这篇文章中，我追溯了一些关于构造效度理论对测试验证的重要性的争论的历史，确定了构造效度理论的基本要素，这些要素对于验证用于特定目的的测试的使用至关重要，并提出了一个测试验证的框架，该框架侧重于测试使用，而不是测试构造。这种“解构”方法包括四个步骤:(a)清楚地阐明测试目的，(b)识别测试使用的潜在负面后果，(c)将测试目的和潜在的滥用与AERA等人(2014)《教育和心理测试标准》中列出的五种效度证据来源交叉，以及(d)优先考虑效度证据来源，以建立一个专注于测试使用和后果的可靠效度论证。解构验证的目标是通过使用构念效度理论中的主要原则来发展一个连贯和全面的效度论证，使心理测量学家、法院法官、政策制定者和公众都能理解;并且与AERA等(2014)标准一致。

{"title":"De-“Constructing” Test Validation","authors":"S. Sireci","doi":"10.59863/ckhh8837","DOIUrl":"https://doi.org/10.59863/ckhh8837","url":null,"abstract":"Construct validity theory presents the most comprehensive description of “validity” as it pertains to educational and psychological testing. The term “construct validity” was introduced in 1954 in the Technical Recommendations for Psychological Tests and Diagnostic Techniques (American Psychological Association [APA], 1954), and subsequently elucidated by two members of the 1954 committee — Cronbach and Meehl (1955). Construct validity theory has had enormous impact on the theoretical descriptions of validity, but it was not explicitly supported by the last two versions of the Standards for Educational and Psychological Testing (American Educational Research Association [AERA] et al., 1999, 2014). In this article I trace some of the history of the debate regarding the importance of construct validity theory for test val- idation, identify the essential elements of construct validity theory that are critical for validating the use of a test for a particular purpose, and propose a framework for test validation that focuses on test use, rather than test construct. This “de-constructed” approach involves four steps: (a) clearly articulating testing purposes, (b) identifying potential negative consequences of test use, (c) crossing test purposes and potential misuses with the five sources of validity evidence listed in the AERA et al. (2014) Standards for Educational and Psychological Testing, and (d) prioritizing the sources of validity evidence needed to build a sound validity argument that focuses on test use and consequences. The goals of deconstructed validation are to embrace the major tenets involved in construct validity theory by using them to develop a coherent and comprehensive validity argument that is comprehensible to psychometricians, court justices, policy makers, and the general public; and is consistent with the AERA et al. (2014) Standards.","PeriodicalId":72586,"journal":{"name":"Chinese/English journal of educational measurement and evaluation","volume":"11 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83654101","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 7

Introduction to the Inaugural Issue of CEJEME CEJEME创刊号简介

Chinese/English journal of educational measurement and evaluation

Pub Date : 2020-12-01 DOI: 10.59863/vhqo9263

L. Cai, Tao Xin

[No abstract.]

(没有抽象的。)

引用次数: 0

去“构念”的测验效度验证去“构念”的测验效度验证

Chinese/English journal of educational measurement and evaluation

Pub Date : 2020-12-01 DOI: 10.59863/eoel9922

S. Sireci

"构念效度理论 (construct validity theory) 为教育和心理测验的“效度”提供了最全面的描述。“构念效度”一词最初由1954年的《心理测试与诊断技术的技术建议》引入 (American Psychological Association [APA], 1954)，随后由两名1954年委员会成员 Cronbach 和 Meehl (1955) 进行了阐释。构念效度理论对效度的理论描述产生了巨大影响，但没有在最近两版的《教育与心理测验标准》 (American Educational Research Association [AERA] et al., 1999, 2014) 中得到明确支持。在本文中，我将回溯有关构念效度理论对测验效度检验的重要性的讨论历史，并且识别构念效度理论的基本要素——这些要素对于有着特定用途的测验的效度验证至关重要。同时，我将提出一个侧重于测验使用，而非测验构念的效度验证框架。这个解构（译：“去构念”）的方法包含四步：（1）明确测验目的，（2）确认测验使用可能造成的消极后果，（3）将测验目的、可能的误用与 AERA et al. (2014) 的《教育与心理测验标准》中的五个效度证据来源进行交叉比对，（4）优先考虑一些效度证据来源，以建立一个充足的、注重测验使用以及后果的效度论证。该去构念效度验证的目标是接受构念效度理论涉及的主要原则，利用这些原则发展一个心理测量学家、法院法官、决策者和一般公众都能理解的、连贯的、全面的效度论证，且与 AERA et al.(2014) 的《标准》保持一致。"

{"title":"去“构念”的测验效度验证","authors":"S. Sireci","doi":"10.59863/eoel9922","DOIUrl":"https://doi.org/10.59863/eoel9922","url":null,"abstract":"\"构念效度理论 (construct validity theory) 为教育和心理测验的“效度”提供了最全面的描述。“构念效度”一词最初由1954年的《心理测试与诊断技术的技术建议》引入 (American Psychological Association [APA], 1954)，随后由两名1954年委员会成员 Cronbach 和 Meehl (1955) 进行了阐释。构念效度理论对效度的理论描述产生了巨大影响，但没有在最近两版的《教育与心理测验标准》 (American Educational Research Association [AERA] et al., 1999, 2014) 中得到明确支持。在本文中，我将回溯有关构念效度理论对测验效度检验的重要性的讨论历史，并且识别构念效度理论的基本要素——这些要素对于有着特定用途的测验的效度验证至关重要。同时，我将提出一个侧重于测验使用，而非测验构念的效度验证框架。这个解构（译：“去构念”）的方法包含四步：（1）明确测验目的，（2）确认测验使用可能造成的消极后果，（3）将测验目的、可能的误用与 AERA et al. (2014) 的《教育与心理测验标准》中的五个效度证据来源进行交叉比对，（4）优先考虑一些效度证据来源，以建立一个充足的、注重测验使用以及后果的效度论证。该去构念效度验证的目标是接受构念效度理论涉及的主要原则，利用这些原则发展一个心理测量学家、法院法官、决策者和一般公众都能理解的、连贯的、全面的效度论证，且与 AERA et al.(2014) 的《标准》保持一致。\"","PeriodicalId":72586,"journal":{"name":"Chinese/English journal of educational measurement and evaluation","volume":"191 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73280861","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

An Intellectual History of Parametric Item Response Theory Models in the Twentieth Century 20世纪参数项目反应理论模型的思想史

Chinese/English journal of educational measurement and evaluation

Pub Date : 2020-12-01 DOI: 10.59863/gpml7603

D. Thissen, L. Steinberg

The intellectual history of parametric item response theory (IRT) models is traced from ideas that originated with E.L. Thorndike, L.L. Thurstone, and Percival Symonds in the early twentieth century. Gradual formulation as a set of latent vari- able models occurred, culminating in publications by Paul Lazarsfeld and Federic Lord around 1950. IRT remained the province of theoreticians without practical ap- plication until the 1970s, when advances in computational technology made possible data analysis using the models. About the same time, the original normal ogive and simple logistic models were augmented with more complex models for multiple- choice and polytomous items. During the final decades of the twentieth century, and continuing into the twenty-first, IRT has become the dominant basis for large-scale educational assessment.

参数项目反应理论(IRT)模型的思想史可以追溯到20世纪初由E.L.桑代克、L.L.瑟斯通和珀西瓦尔西蒙兹提出的思想。逐渐形成了一套潜在变量模型，并在1950年左右由Paul Lazarsfeld和Federic Lord发表。直到20世纪70年代，当计算技术的进步使使用模型进行数据分析成为可能时，IRT仍然是理论家的领域，没有实际应用。与此同时，原始的正态逻辑模型和简单逻辑模型被扩展为更复杂的多选题和多同构题模型。在20世纪的最后几十年，并持续到21世纪，IRT已经成为大规模教育评估的主要基础。

引用次数: 2