Tammy D Tolar, D. Francis, Paulina A. Kulesz, K. Stuebing
English language learner (EL) status has high stakes implications for determining when and how ELs should be evaluated for academic achievement. In the US, students designated as English learners are assessed annually for English language proficiency (ELP), a complex construct whose conceptualization has evolved in recent years to reflect more precisely the language demands of content area achievement as reflected in the standards of individual states and state language assessment consortia, such as WIDA and ELPA21. The goal of this paper was to examine the possible role for and utility of using content area assessments to validate language proficiency mastery criteria. Specifically, we applied mixture item response models to identify two classes of EL students: (1) ELs for whom English language arts and math achievement test items have similar difficulty and discrimination parameters as they do for non-ELs and (2) ELs for whom the test items function differently. We used latent class IRT methods to identify the two groups of ELs and to evaluate the effects of different subscales of ELP (reading, writing, listening, and speaking) on group membership. Only reading and writing were significant predictors of class membership. Cut-scores based on summary scores of ELP were imperfect predictors of class membership and indicated the need for finer differentiation within the top proficiency category. This study demonstrates the importance of linking definitions of ELP to the context for which ELP is used and suggests the possible value of psychometric analyses when language proficiency standards are linked to the language requirements for content area achievement.
{"title":"A Latent Class IRT Approach to Defining and Measuring Language Proficiency.","authors":"Tammy D Tolar, D. Francis, Paulina A. Kulesz, K. Stuebing","doi":"10.59863/ycua8620","DOIUrl":"https://doi.org/10.59863/ycua8620","url":null,"abstract":"English language learner (EL) status has high stakes implications for determining when and how ELs should be evaluated for academic achievement. In the US, students designated as English learners are assessed annually for English language proficiency (ELP), a complex construct whose conceptualization has evolved in recent years to reflect more precisely the language demands of content area achievement as reflected in the standards of individual states and state language assessment consortia, such as WIDA and ELPA21. The goal of this paper was to examine the possible role for and utility of using content area assessments to validate language proficiency mastery criteria. Specifically, we applied mixture item response models to identify two classes of EL students: (1) ELs for whom English language arts and math achievement test items have similar difficulty and discrimination parameters as they do for non-ELs and (2) ELs for whom the test items function differently. We used latent class IRT methods to identify the two groups of ELs and to evaluate the effects of different subscales of ELP (reading, writing, listening, and speaking) on group membership. Only reading and writing were significant predictors of class membership. Cut-scores based on summary scores of ELP were imperfect predictors of class membership and indicated the need for finer differentiation within the top proficiency category. This study demonstrates the importance of linking definitions of ELP to the context for which ELP is used and suggests the possible value of psychometric analyses when language proficiency standards are linked to the language requirements for content area achievement.","PeriodicalId":72586,"journal":{"name":"Chinese/English journal of educational measurement and evaluation","volume":"29 1","pages":"49-73"},"PeriodicalIF":0.0,"publicationDate":"2021-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81219508","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
China’s new college entrance examination (the new gaokao) reform provides an opportunity for researchers and practitioners of educational measurement to directly participate in the reform. Therefore, conducting in-depth research on the characteristics of the new gaokao and the issues it faces, and finding corresponding solutions theoretically, methodologically, and technically will not only help to deploy the education examination reform smoothly, but also expand and enrich the research and application of educational measurement. This article provides discussions and suggestions on some issues related to the new gaokao reform, including the stability issue of the examination brought by the scoring methods and subject selection, the equating issue of test scores due to biannual tests or cross-year comparisons, and the issue of giving feedback to basic education based on the analysis of the gaokao data.
{"title":"Application Innovation of Educational Measurement Theory, Method, and Technology in China’s New College Entrance Examination Reform","authors":"Zhengyan Liang, Minqiang Zhang, Feifei Huang, Derong Kang, Lingling Xu","doi":"10.59863/cbjl1170","DOIUrl":"https://doi.org/10.59863/cbjl1170","url":null,"abstract":"China’s new college entrance examination (the new gaokao) reform provides an opportunity for researchers and practitioners of educational measurement to directly participate in the reform. Therefore, conducting in-depth research on the characteristics of the new gaokao and the issues it faces, and finding corresponding solutions theoretically, methodologically, and technically will not only help to deploy the education examination reform smoothly, but also expand and enrich the research and application of educational measurement. This article provides discussions and suggestions on some issues related to the new gaokao reform, including the stability issue of the examination brought by the scoring methods and subject selection, the equating issue of test scores due to biannual tests or cross-year comparisons, and the issue of giving feedback to basic education based on the analysis of the gaokao data.","PeriodicalId":72586,"journal":{"name":"Chinese/English journal of educational measurement and evaluation","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89724141","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"The Development of Psychological and Educational Measurement in China","authors":"Houcan Zhang, Fang Luo","doi":"10.59863/buai8988","DOIUrl":"https://doi.org/10.59863/buai8988","url":null,"abstract":"[No abstract.]","PeriodicalId":72586,"journal":{"name":"Chinese/English journal of educational measurement and evaluation","volume":"5 1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90149016","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"CEJEME创刊号前言","authors":"Cai Li, Tao Xin","doi":"10.59863/rjbo6659","DOIUrl":"https://doi.org/10.59863/rjbo6659","url":null,"abstract":"【本文无摘要。】","PeriodicalId":72586,"journal":{"name":"Chinese/English journal of educational measurement and evaluation","volume":"103 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80768562","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Construct validity theory presents the most comprehensive description of “validity” as it pertains to educational and psychological testing. The term “construct validity” was introduced in 1954 in the Technical Recommendations for Psychological Tests and Diagnostic Techniques (American Psychological Association [APA], 1954), and subsequently elucidated by two members of the 1954 committee — Cronbach and Meehl (1955). Construct validity theory has had enormous impact on the theoretical descriptions of validity, but it was not explicitly supported by the last two versions of the Standards for Educational and Psychological Testing (American Educational Research Association [AERA] et al., 1999, 2014). In this article I trace some of the history of the debate regarding the importance of construct validity theory for test val- idation, identify the essential elements of construct validity theory that are critical for validating the use of a test for a particular purpose, and propose a framework for test validation that focuses on test use, rather than test construct. This “de-constructed” approach involves four steps: (a) clearly articulating testing purposes, (b) identifying potential negative consequences of test use, (c) crossing test purposes and potential misuses with the five sources of validity evidence listed in the AERA et al. (2014) Standards for Educational and Psychological Testing, and (d) prioritizing the sources of validity evidence needed to build a sound validity argument that focuses on test use and consequences. The goals of deconstructed validation are to embrace the major tenets involved in construct validity theory by using them to develop a coherent and comprehensive validity argument that is comprehensible to psychometricians, court justices, policy makers, and the general public; and is consistent with the AERA et al. (2014) Standards.
建构效度理论是对教育和心理测试中“效度”最全面的描述。“建构效度”一词于1954年在《心理测试和诊断技术技术建议》(美国心理协会[APA], 1954)中被引入,随后由1954年委员会的两位成员——Cronbach和Meehl(1955)加以阐明。构念效度理论对效度的理论描述产生了巨大的影响,但最近两版的《教育与心理测试标准》并未明确支持构念效度理论(美国教育研究协会[AERA] et al., 1999,2014)。在这篇文章中,我追溯了一些关于构造效度理论对测试验证的重要性的争论的历史,确定了构造效度理论的基本要素,这些要素对于验证用于特定目的的测试的使用至关重要,并提出了一个测试验证的框架,该框架侧重于测试使用,而不是测试构造。这种“解构”方法包括四个步骤:(a)清楚地阐明测试目的,(b)识别测试使用的潜在负面后果,(c)将测试目的和潜在的滥用与AERA等人(2014)《教育和心理测试标准》中列出的五种效度证据来源交叉,以及(d)优先考虑效度证据来源,以建立一个专注于测试使用和后果的可靠效度论证。解构验证的目标是通过使用构念效度理论中的主要原则来发展一个连贯和全面的效度论证,使心理测量学家、法院法官、政策制定者和公众都能理解;并且与AERA等(2014)标准一致。
{"title":"De-“Constructing” Test Validation","authors":"S. Sireci","doi":"10.59863/ckhh8837","DOIUrl":"https://doi.org/10.59863/ckhh8837","url":null,"abstract":"Construct validity theory presents the most comprehensive description of “validity” as it pertains to educational and psychological testing. The term “construct validity” was introduced in 1954 in the Technical Recommendations for Psychological Tests and Diagnostic Techniques (American Psychological Association [APA], 1954), and subsequently elucidated by two members of the 1954 committee — Cronbach and Meehl (1955). Construct validity theory has had enormous impact on the theoretical descriptions of validity, but it was not explicitly supported by the last two versions of the Standards for Educational and Psychological Testing (American Educational Research Association [AERA] et al., 1999, 2014). In this article I trace some of the history of the debate regarding the importance of construct validity theory for test val- idation, identify the essential elements of construct validity theory that are critical for validating the use of a test for a particular purpose, and propose a framework for test validation that focuses on test use, rather than test construct. This “de-constructed” approach involves four steps: (a) clearly articulating testing purposes, (b) identifying potential negative consequences of test use, (c) crossing test purposes and potential misuses with the five sources of validity evidence listed in the AERA et al. (2014) Standards for Educational and Psychological Testing, and (d) prioritizing the sources of validity evidence needed to build a sound validity argument that focuses on test use and consequences. The goals of deconstructed validation are to embrace the major tenets involved in construct validity theory by using them to develop a coherent and comprehensive validity argument that is comprehensible to psychometricians, court justices, policy makers, and the general public; and is consistent with the AERA et al. (2014) Standards.","PeriodicalId":72586,"journal":{"name":"Chinese/English journal of educational measurement and evaluation","volume":"11 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83654101","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Introduction to the Inaugural Issue of CEJEME","authors":"L. Cai, Tao Xin","doi":"10.59863/vhqo9263","DOIUrl":"https://doi.org/10.59863/vhqo9263","url":null,"abstract":"[No abstract.]","PeriodicalId":72586,"journal":{"name":"Chinese/English journal of educational measurement and evaluation","volume":"19 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77001100","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The intellectual history of parametric item response theory (IRT) models is traced from ideas that originated with E.L. Thorndike, L.L. Thurstone, and Percival Symonds in the early twentieth century. Gradual formulation as a set of latent vari- able models occurred, culminating in publications by Paul Lazarsfeld and Federic Lord around 1950. IRT remained the province of theoreticians without practical ap- plication until the 1970s, when advances in computational technology made possible data analysis using the models. About the same time, the original normal ogive and simple logistic models were augmented with more complex models for multiple- choice and polytomous items. During the final decades of the twentieth century, and continuing into the twenty-first, IRT has become the dominant basis for large-scale educational assessment.
{"title":"An Intellectual History of Parametric Item Response Theory Models in the Twentieth Century","authors":"D. Thissen, L. Steinberg","doi":"10.59863/gpml7603","DOIUrl":"https://doi.org/10.59863/gpml7603","url":null,"abstract":"The intellectual history of parametric item response theory (IRT) models is traced from ideas that originated with E.L. Thorndike, L.L. Thurstone, and Percival Symonds in the early twentieth century. Gradual formulation as a set of latent vari- able models occurred, culminating in publications by Paul Lazarsfeld and Federic Lord around 1950. IRT remained the province of theoreticians without practical ap- plication until the 1970s, when advances in computational technology made possible data analysis using the models. About the same time, the original normal ogive and simple logistic models were augmented with more complex models for multiple- choice and polytomous items. During the final decades of the twentieth century, and continuing into the twenty-first, IRT has become the dominant basis for large-scale educational assessment.","PeriodicalId":72586,"journal":{"name":"Chinese/English journal of educational measurement and evaluation","volume":"86 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83641942","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}