Journal of Educational and Behavioral Statistics最新文献_第10页

Computational Strategies and Estimation Performance With Bayesian Semiparametric Item Response Theory Models 贝叶斯半参数项目反应理论模型的计算策略和估计性能

IF 2.4 3区心理学 Q2 EDUCATION & EDUCATIONAL RESEARCH

Journal of Educational and Behavioral Statistics

Pub Date : 2021-01-27 DOI: 10.3102/10769986221136105

S. Paganin, C. Paciorek, Claudia Wehrhahn, Abel Rodríguez, S. Rabe-Hesketh, P. de Valpine

Item response theory (IRT) models typically rely on a normality assumption for subject-specific latent traits, which is often unrealistic in practice. Semiparametric extensions based on Dirichlet process mixtures (DPMs) offer a more flexible representation of the unknown distribution of the latent trait. However, the use of such models in the IRT literature has been extremely limited, in good part because of the lack of comprehensive studies and accessible software tools. This article provides guidance for practitioners on semiparametric IRT models and their implementation. In particular, we rely on NIMBLE, a flexible software system for hierarchical models that enables the use of DPMs. We highlight efficient sampling strategies for model estimation and compare inferential results under parametric and semiparametric models.

项目反应理论（IRT）模型通常依赖于对特定主题潜在特征的正态性假设，这在实践中往往是不现实的。基于狄利克雷过程混合物（DPM）的半参数扩展为潜在特征的未知分布提供了更灵活的表示。然而，IRT文献中对此类模型的使用极为有限，这在很大程度上是因为缺乏全面的研究和可访问的软件工具。本文为从业者提供了关于半参数IRT模型及其实现的指导。特别是，我们依赖NIMBLE，这是一个用于分层模型的灵活软件系统，可以使用DPM。我们强调了模型估计的有效采样策略，并比较了参数和半参数模型下的推断结果。

引用次数: 5

Regression Discontinuity Designs With an Ordinal Running Variable: Evaluating the Effects of Extended Time Accommodations for English-Language Learners 具有有序运行变量的回归不连续设计:评估英语学习者延长住宿时间的影响

IF 2.4 3区心理学 Q2 EDUCATION & EDUCATIONAL RESEARCH

Journal of Educational and Behavioral Statistics

Pub Date : 2021-01-20 DOI: 10.3102/10769986221090275

Youmi Suk, Peter M Steiner, Jee-Seon Kim, Hyunseung Kang

Regression discontinuity (RD) designs are commonly used for program evaluation with continuous treatment assignment variables. But in practice, treatment assignment is frequently based on ordinal variables. In this study, we propose an RD design with an ordinal running variable to assess the effects of extended time accommodations (ETA) for English-language learners (ELLs). ETA eligibility is determined by ordinal ELL English-proficiency categories of National Assessment of Educational Progress data. We discuss the identification and estimation of the average treatment effect (ATE), intent-to-treat effect, and the local ATE at the cutoff. We also propose a series of sensitivity analyses to probe the effect estimates’ robustness to the choices of scaling functions and cutoff scores and remaining confounding.

回归不连续(RD)设计通常用于具有连续处理分配变量的方案评估。但在实践中，治疗分配往往是基于有序变量。在这项研究中，我们提出了一个带有有序运行变量的RD设计来评估延长时间住宿(ETA)对英语学习者(ELLs)的影响。ETA资格由国家教育进步评估数据的有序英语水平类别决定。我们讨论了识别和估计平均处理效果(ATE)，意向治疗效果，和局部ATE在截止点。我们还提出了一系列敏感性分析，以探讨效果估计对尺度函数和截止分数的选择和剩余混淆的稳健性。

引用次数: 0

Cross-Classified Random Effects Modeling for Moderated Item Calibration 适度项目校准的交叉分类随机效应模型

IF 2.4 3区心理学 Q2 EDUCATION & EDUCATIONAL RESEARCH

Journal of Educational and Behavioral Statistics

Pub Date : 2021-01-12 DOI: 10.3102/1076998620983908

Seungwon Chung, Li Cai

In the research reported here, we propose a new method for scale alignment and test scoring in the context of supporting students with disabilities. In educational assessment, students from these special populations take modified tests because of a demonstrated disability that requires more assistance than standard testing accommodation. Updated federal education legislation and guidance require that these students be assessed and included in state education accountability systems, and their achievement reported with respect to the same rigorous content and achievement standards that the state adopted. Routine item calibration and linking methods are not feasible because the size of these special populations tends to be small. We develop a unified cross-classified random effects model that utilizes item response data from the general population as well as judge-provided data from subject matter experts in order to obtain revised item parameter estimates for use in scoring modified tests. We extend the Metropolis–Hastings Robbins–Monro algorithm to estimate the parameters of this model. The proposed method is applied to Braille test forms in a large operational multistate English language proficiency assessment program. Our work not only allows a broader range of modifications that is routinely considered in large-scale educational assessments but also directly incorporates the input from subject matter experts who work directly with the students needing support. Their structured and informed feedback deserves more attention from the psychometric community.

在本研究中，我们提出了一种在支持残疾学生的背景下进行量表校准和测试评分的新方法。在教育评估中，来自这些特殊群体的学生参加修改后的考试，因为他们证明有残疾，需要比标准考试便利更多的帮助。最新的联邦教育立法和指导要求对这些学生进行评估，并将其纳入州教育问责制，他们的成绩报告与州采用的严格内容和成绩标准相同。常规的项目校准和连接方法是不可行的，因为这些特殊人群的规模往往很小。我们开发了一个统一的交叉分类随机效应模型，该模型利用了来自一般人群的项目反应数据以及来自主题专家的判断提供的数据，以便获得用于评分修改测试的修订项目参数估计。我们扩展了Metropolis-Hastings - Robbins-Monro算法来估计该模型的参数。该方法已应用于一个大型多州英语语言能力评估项目的盲文测试表格。我们的工作不仅允许在大规模教育评估中常规考虑的更广泛的修改，而且还直接纳入了直接与需要支持的学生一起工作的主题专家的输入。他们的结构化和知情反馈值得心理测量界更多的关注。

{"title":"Cross-Classified Random Effects Modeling for Moderated Item Calibration","authors":"Seungwon Chung, Li Cai","doi":"10.3102/1076998620983908","DOIUrl":"https://doi.org/10.3102/1076998620983908","url":null,"abstract":"In the research reported here, we propose a new method for scale alignment and test scoring in the context of supporting students with disabilities. In educational assessment, students from these special populations take modified tests because of a demonstrated disability that requires more assistance than standard testing accommodation. Updated federal education legislation and guidance require that these students be assessed and included in state education accountability systems, and their achievement reported with respect to the same rigorous content and achievement standards that the state adopted. Routine item calibration and linking methods are not feasible because the size of these special populations tends to be small. We develop a unified cross-classified random effects model that utilizes item response data from the general population as well as judge-provided data from subject matter experts in order to obtain revised item parameter estimates for use in scoring modified tests. We extend the Metropolis–Hastings Robbins–Monro algorithm to estimate the parameters of this model. The proposed method is applied to Braille test forms in a large operational multistate English language proficiency assessment program. Our work not only allows a broader range of modifications that is routinely considered in large-scale educational assessments but also directly incorporates the input from subject matter experts who work directly with the students needing support. Their structured and informed feedback deserves more attention from the psychometric community.","PeriodicalId":48001,"journal":{"name":"Journal of Educational and Behavioral Statistics","volume":"46 1","pages":"651 - 681"},"PeriodicalIF":2.4,"publicationDate":"2021-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45633091","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

A Practical Guide for Analyzing Large-Scale Assessment Data Using Mplus: A Case Demonstration Using the Program for International Assessment of Adult Competencies Data 使用Mplus分析大规模评估数据的实用指南:使用成人能力数据国际评估计划的案例演示

IF 2.4 3区心理学 Q2 EDUCATION & EDUCATIONAL RESEARCH

Journal of Educational and Behavioral Statistics

Pub Date : 2020-12-16 DOI: 10.3102/1076998620978554

T. Yamashita, Thomas J. Smith, P. Cummins

In order to promote the use of increasingly available large-scale assessment data in education and expand the scope of analytic capabilities among applied researchers, this study provides step-by-step guidance, and practical examples of syntax and data analysis using Mplus. Concise overview and key unique aspects of large-scale assessment data from the 2012/2014 Program for International Assessment of Adult Competencies (PIAAC) are described. Using commonly-used statistical software including SAS and R, a simple macro program and syntax are developed to streamline the data preparation process. Then, two examples of structural equation models are demonstrated using Mplus. The suggested data preparation and analytic approaches can be immediately applicable to existing large-scale assessment data.

为了促进在教育中使用越来越可用的大规模评估数据，并扩大应用研究人员的分析能力范围，本研究提供了使用Mplus进行语法和数据分析的分步指导和实际示例。简要概述了2012/2014年国际成人能力评估计划（PIAC）的大规模评估数据的主要独特方面。使用包括SAS和R在内的常用统计软件，开发了一个简单的宏程序和语法，以简化数据准备过程。然后，使用Mplus演示了结构方程模型的两个例子。建议的数据准备和分析方法可以立即适用于现有的大规模评估数据。

引用次数: 2

A Review of Handbook of Item Response Theory: Vol. 1 评《项目反应理论手册》第一卷

IF 2.4 3区心理学 Q2 EDUCATION & EDUCATIONAL RESEARCH

Journal of Educational and Behavioral Statistics

Pub Date : 2020-12-16 DOI: 10.3102/1076998620978551

Peter F. Halpin

The Handbook of Item Response Theory is an extensive three-volume collection with contributions from leading researchers in the field. This review focuses on Volume 1 (Models). Aside from the Introduction, each of the 33 chapters provides a self-contained presentation of an item response theory (IRT) modeling framework. The chapters share a common notation as well as a uniform organization (Introduction, Model Presentation, Parameter Estimation, Goodness of Fit, an Empirical Example, and a Discussion). Many chapters are leador singleauthored by original developers of the research, and in all cases, the lead authors are highly regarded as experts in the field. The Volume is organized into eight sections, each containing between two and seven chapters focused on types of data—dichotomous responses, polytomous responses, response times—or on types of models—multidimensional, nonparametric, nonmonotone, hierarchical and multilevel as well as generalized modeling approaches that include but are not limited to IRT applications. The coverage of models for polytomous data is especially strong, with seven chapters devoted to this topic. In other areas, the coverage is already appearing somewhat thin in light of recent research trends. For example, a large amount of work has been devoted to the analysis of response times since the publication of the Volume. The three chapters in the Volume provide the foundations of this more recent research, focusing the early work of Rasch, approaches based on cognitive models of decision making, and models for lognormal response times; the latter is extended to the joint modeling of responses and response times in a separate chapter. Generalized modeling approaches is another area that, in retrospect, could have received more thorough coverage of topics such as Bayesian IRT, psychometric applications of networks and graphs, or approaches based on machine learning. There is only one chapter addressing models with categorical latent variables. Despite the inevitable nit-picking about specific omissions, the Handbook certainly provides a thorough characterization of the breadth of active research on statistical models used in the IRT literature. Journal of Educational and Behavioral Statistics 2021, Vol. 46, No. 4, pp. 519–522 DOI: 10.3102/1076998620978551 Article reuse guidelines: sagepub.com/journals-permissions © 2020 AERA. http://jebs.aera.net

该手册的项目反应理论是一个广泛的三卷收集与贡献的主要研究人员在该领域。本综述的重点是第1卷(模型)。除了引言，33章中的每一章都提供了一个项目反应理论(IRT)建模框架的独立展示。章节共享一个共同的符号，以及一个统一的组织(介绍，模型表示，参数估计，拟合优度，一个经验的例子，并讨论)。许多章节都是由该研究的原始开发人员领导或单独撰写的，并且在所有情况下，主要作者都被视为该领域的专家。本卷分为八个部分，每个部分包含两到七章，重点介绍数据类型-二分类响应，多分类响应，响应时间-或模型类型-多维，非参数，非单调，分层和多层以及广义建模方法，包括但不限于IRT应用。多同构数据模型的覆盖面特别强，有七章专门讨论这个主题。在其他领域，鉴于最近的研究趋势，覆盖范围已经显得有些单薄。例如，自本卷出版以来，已经进行了大量的工作来分析响应时间。本卷的三章提供了这一最新研究的基础，重点是Rasch的早期工作，基于决策认知模型的方法，以及对数正态反应时间模型;后者将在单独的一章中扩展到响应和响应时间的联合建模。广义建模方法是另一个领域，回想起来，可以得到更全面的主题覆盖，如贝叶斯IRT，网络和图的心理测量应用，或基于机器学习的方法。只有一章讨论具有分类潜在变量的模型。尽管不可避免地会对具体的遗漏进行挑剔，但该手册确实提供了对IRT文献中使用的统计模型的活跃研究广度的全面描述。教育与行为统计杂志2021,Vol. 46, No. 4, pp. 519-522 DOI: 10.3102/1076998620978551文章重用指南:sagepub.com/journals-permissions©2020 AERA。http://jebs.aera.net

{"title":"A Review of Handbook of Item Response Theory: Vol. 1","authors":"Peter F. Halpin","doi":"10.3102/1076998620978551","DOIUrl":"https://doi.org/10.3102/1076998620978551","url":null,"abstract":"The Handbook of Item Response Theory is an extensive three-volume collection with contributions from leading researchers in the field. This review focuses on Volume 1 (Models). Aside from the Introduction, each of the 33 chapters provides a self-contained presentation of an item response theory (IRT) modeling framework. The chapters share a common notation as well as a uniform organization (Introduction, Model Presentation, Parameter Estimation, Goodness of Fit, an Empirical Example, and a Discussion). Many chapters are leador singleauthored by original developers of the research, and in all cases, the lead authors are highly regarded as experts in the field. The Volume is organized into eight sections, each containing between two and seven chapters focused on types of data—dichotomous responses, polytomous responses, response times—or on types of models—multidimensional, nonparametric, nonmonotone, hierarchical and multilevel as well as generalized modeling approaches that include but are not limited to IRT applications. The coverage of models for polytomous data is especially strong, with seven chapters devoted to this topic. In other areas, the coverage is already appearing somewhat thin in light of recent research trends. For example, a large amount of work has been devoted to the analysis of response times since the publication of the Volume. The three chapters in the Volume provide the foundations of this more recent research, focusing the early work of Rasch, approaches based on cognitive models of decision making, and models for lognormal response times; the latter is extended to the joint modeling of responses and response times in a separate chapter. Generalized modeling approaches is another area that, in retrospect, could have received more thorough coverage of topics such as Bayesian IRT, psychometric applications of networks and graphs, or approaches based on machine learning. There is only one chapter addressing models with categorical latent variables. Despite the inevitable nit-picking about specific omissions, the Handbook certainly provides a thorough characterization of the breadth of active research on statistical models used in the IRT literature. Journal of Educational and Behavioral Statistics 2021, Vol. 46, No. 4, pp. 519–522 DOI: 10.3102/1076998620978551 Article reuse guidelines: sagepub.com/journals-permissions © 2020 AERA. http://jebs.aera.net","PeriodicalId":48001,"journal":{"name":"Journal of Educational and Behavioral Statistics","volume":"46 1","pages":"519 - 522"},"PeriodicalIF":2.4,"publicationDate":"2020-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46999108","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Acknowledgments 致谢

IF 2.4 3区心理学 Q2 EDUCATION & EDUCATIONAL RESEARCH

Journal of Educational and Behavioral Statistics

Pub Date : 2020-12-01 DOI: 10.3102/1076998620958383

C. Aßmann, Yinyin Chen, Bryan Keller, Jee-Seon Kim, K. Kim

Christian Aßmann, University of Bamberg Guillaume Basse, Stanford University Michela Battauz, University of Udine Eli Ben-Michael, University of CaliforniaBerkeley Howard Bloom, Manpower Demonstration Research Corporation (MDRC) Ulf Bockenholt, Northwestern University Daniel Bolt, University of Wisconsin Eric Bradlow, Wharton School of the University of Pennsylvania Henry Braun, Boston College Katherine Castellano, Educational Testing Service Wendy Chan, University of Pennsylvania Yinghan Chen, University of Nevada, Reno Yinyin Chen, University of Illinois at Urbana-Champaign Yunxiao Chen, London School of Economics and Political Science Edison Choe, Graduate Management Admission Council Steven Culpepper, University of Illinois at Urbana-Champaign Paul De Boeck, The Ohio State University Jimmy de la Torre, The University of Hong Kong Dries Debeer, University of Leuven (KU Leuven) Peng Ding, University of CaliforniaBerkeley Nianbo Dong, University of North Carolina at Chapel Hill Fritz Drasgow, University of Illinois at Urbana-Champaign Keelan Evanini, Educational Testing Service Jean-Paul Fox, University of Twente Mark Fredrickson, University of Michigan Jorge Gonzaléz, Pontificia Universidad Catolica de Chile Simon Grund, Leibniz-Institut fur die Padagogik der Naturwissenschaften und Mathematik an der Universitat Kiel Hongwen Guo, Educational Testing Service Gregory Hancock, University of MarylandCollege Park Ben Hansen, University of Michigan Jeffrey Harring, University of MarylandCollege Park Johannes Hartig, German Institute for International Educational Research (DIPF) Michael Harwell, University of Minnesota Timothy Hayes, Florida International University Yong He, ACT, Inc. Carolyn Hill, Manpower Demonstration Research Corporation (MDRC) Minjeong Jeon, University of CaliforniaLos Angeles Paul Jewsbury, Educational Testing Service Booil Jo, Stanford University Harry Joe, University of British Columbia Matthew Johnson, Educational Testing Service George Karabatsos, University of IllinoisChicago Luke Keele, University of Pennsylvania Augustin Kelava, Eberhard Karls University Tuebingen Journal of Educational and Behavioral Statistics 2020, Vol. 45, No. 6, pp. 771–773 DOI: 10.3102/1076998620958383 Article reuse guidelines: sagepub.com/journals-permissions © 2020 AERA. http://jebs.aera.net

Christian Aßmann、班贝格-纪尧姆·巴塞大学、斯坦福大学Michela Battaz、乌迪内大学Eli Ben Michael、加州大学伯克利分校Howard Bloom、人力资源示范研究公司（MDRC）Ulf Bockenholt、西北大学Daniel Bolt、威斯康星大学Eric Bradlow、宾夕法尼亚大学沃顿商学院Henry Braun，波士顿学院Katherine Castellano、教育测试服务机构Wendy Chan、宾夕法尼亚大学Yinghan Chen、内华达大学Reno Yinyin Chen、伊利诺伊大学厄巴纳-香槟分校Yunxiao Chen、伦敦政治经济学院Edison Choe、研究生管理招生委员会Steven Culpepper、，俄亥俄州立大学Jimmy de la Torre、香港大学Dries Debeer、鲁汶大学（KU Leuven）Peng Ding、加州大学伯克利分校Nianbo Dong、北卡罗来纳大学教堂山分校Fritz Drasgow、伊利诺伊大学厄巴纳-香槟分校Keelan Evanini、教育测试服务机构Jean-Paul Fox、特文特大学Mark Fredrickson、，密歇根大学Jorge Gonzaléz、智利天主教大学Simon Grund、莱布尼茨自然科学与数学研究所和基尔大学郭洪文、教育测试服务机构Gregory Hancock、马里兰大学Ben Hansen、密歇根大学Jeffrey Harring、马里兰大学Johannes Hartig，德国国际教育研究所（DIPF）Michael Harwell，明尼苏达大学Timothy Hayes，佛罗里达国际大学Yong He，ACT，股份有限公司Carolyn Hill，人力示范研究公司（MDRC）Minjeong Jeon，加州大学洛杉矶分校Paul Jewsbury，教育测试服务机构Booil Jo，斯坦福大学Harry Joe，不列颠哥伦比亚大学Matthew Johnson，教育测试服务机构George Karabatsos，伊利诺伊大学Chicago Luke Keele，宾夕法尼亚大学Augustin Kelava，Eberhard Karls University Tubingen Journal of Educational and Behavioral Statistics 2020，Vol.45，No.6，第771–773页DOI:10.3102/1076998620958383文章重用指南：sagepub.com/journals-permissions©2020 AERA。http://jebs.aera.net

{"title":"Acknowledgments","authors":"C. Aßmann, Yinyin Chen, Bryan Keller, Jee-Seon Kim, K. Kim","doi":"10.3102/1076998620958383","DOIUrl":"https://doi.org/10.3102/1076998620958383","url":null,"abstract":"Christian Aßmann, University of Bamberg Guillaume Basse, Stanford University Michela Battauz, University of Udine Eli Ben-Michael, University of CaliforniaBerkeley Howard Bloom, Manpower Demonstration Research Corporation (MDRC) Ulf Bockenholt, Northwestern University Daniel Bolt, University of Wisconsin Eric Bradlow, Wharton School of the University of Pennsylvania Henry Braun, Boston College Katherine Castellano, Educational Testing Service Wendy Chan, University of Pennsylvania Yinghan Chen, University of Nevada, Reno Yinyin Chen, University of Illinois at Urbana-Champaign Yunxiao Chen, London School of Economics and Political Science Edison Choe, Graduate Management Admission Council Steven Culpepper, University of Illinois at Urbana-Champaign Paul De Boeck, The Ohio State University Jimmy de la Torre, The University of Hong Kong Dries Debeer, University of Leuven (KU Leuven) Peng Ding, University of CaliforniaBerkeley Nianbo Dong, University of North Carolina at Chapel Hill Fritz Drasgow, University of Illinois at Urbana-Champaign Keelan Evanini, Educational Testing Service Jean-Paul Fox, University of Twente Mark Fredrickson, University of Michigan Jorge Gonzaléz, Pontificia Universidad Catolica de Chile Simon Grund, Leibniz-Institut fur die Padagogik der Naturwissenschaften und Mathematik an der Universitat Kiel Hongwen Guo, Educational Testing Service Gregory Hancock, University of MarylandCollege Park Ben Hansen, University of Michigan Jeffrey Harring, University of MarylandCollege Park Johannes Hartig, German Institute for International Educational Research (DIPF) Michael Harwell, University of Minnesota Timothy Hayes, Florida International University Yong He, ACT, Inc. Carolyn Hill, Manpower Demonstration Research Corporation (MDRC) Minjeong Jeon, University of CaliforniaLos Angeles Paul Jewsbury, Educational Testing Service Booil Jo, Stanford University Harry Joe, University of British Columbia Matthew Johnson, Educational Testing Service George Karabatsos, University of IllinoisChicago Luke Keele, University of Pennsylvania Augustin Kelava, Eberhard Karls University Tuebingen Journal of Educational and Behavioral Statistics 2020, Vol. 45, No. 6, pp. 771–773 DOI: 10.3102/1076998620958383 Article reuse guidelines: sagepub.com/journals-permissions © 2020 AERA. http://jebs.aera.net","PeriodicalId":48001,"journal":{"name":"Journal of Educational and Behavioral Statistics","volume":"45 1","pages":"771 - 773"},"PeriodicalIF":2.4,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43561935","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Adaptive Weight Estimation of Latent Ability: Application to Computerized Adaptive Testing With Response Revision 潜在能力的自适应权重估计:在计算机自适应测试中的应用

IF 2.4 3区心理学 Q2 EDUCATION & EDUCATIONAL RESEARCH

Journal of Educational and Behavioral Statistics

Pub Date : 2020-11-24 DOI: 10.3102/1076998620972800

Shiyu Wang, Houping Xiao, A. Cohen

An adaptive weight estimation approach is proposed to provide robust latent ability estimation in computerized adaptive testing (CAT) with response revision. This approach assigns different weights to each distinct response to the same item when response revision is allowed in CAT. Two types of weight estimation procedures, nonfunctional and functional weight, are proposed to determine the weight adaptively based on the compatibility of each revised response with the assumed statistical model in relation to remaining observations. The application of this estimation approach to a data set collected from a large-scale multistage adaptive testing demonstrates the capability of this method to reveal more information regarding the test taker’s latent ability by using the valid response path compared with only using the very last response. Limited simulation studies were concluded to evaluate the proposed ability estimation method and to compare it with several other estimation procedures in literature. Results indicate that the proposed ability estimation approach is able to provide robust estimation results in two test-taking scenarios.

提出了一种自适应权重估计方法，用于在计算机自适应测试（CAT）中提供具有响应修正的鲁棒潜在能力估计。当CAT中允许修改响应时，这种方法为同一项目的每个不同响应分配不同的权重。提出了两种类型的权重估计程序，即非函数权重和函数权重，以根据每个修正响应与剩余观测值相关的假设统计模型的兼容性自适应地确定权重。将这种估计方法应用于从大规模多级自适应测试中收集的数据集表明，与仅使用最后一次响应相比，这种方法能够通过使用有效响应路径来揭示更多关于考生潜在能力的信息。完成了有限的模拟研究，以评估所提出的能力估计方法，并将其与文献中的其他几种估计程序进行比较。结果表明，所提出的能力估计方法能够在两种测试场景中提供稳健的估计结果。

{"title":"Adaptive Weight Estimation of Latent Ability: Application to Computerized Adaptive Testing With Response Revision","authors":"Shiyu Wang, Houping Xiao, A. Cohen","doi":"10.3102/1076998620972800","DOIUrl":"https://doi.org/10.3102/1076998620972800","url":null,"abstract":"An adaptive weight estimation approach is proposed to provide robust latent ability estimation in computerized adaptive testing (CAT) with response revision. This approach assigns different weights to each distinct response to the same item when response revision is allowed in CAT. Two types of weight estimation procedures, nonfunctional and functional weight, are proposed to determine the weight adaptively based on the compatibility of each revised response with the assumed statistical model in relation to remaining observations. The application of this estimation approach to a data set collected from a large-scale multistage adaptive testing demonstrates the capability of this method to reveal more information regarding the test taker’s latent ability by using the valid response path compared with only using the very last response. Limited simulation studies were concluded to evaluate the proposed ability estimation method and to compare it with several other estimation procedures in literature. Results indicate that the proposed ability estimation approach is able to provide robust estimation results in two test-taking scenarios.","PeriodicalId":48001,"journal":{"name":"Journal of Educational and Behavioral Statistics","volume":"46 1","pages":"560 - 591"},"PeriodicalIF":2.4,"publicationDate":"2020-11-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.3102/1076998620972800","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45694185","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Ordinal Approaches to Decomposing Between-Group Test Score Disparities 分解组间测试成绩差异的顺序方法

IF 2.4 3区心理学 Q2 EDUCATION & EDUCATIONAL RESEARCH

Journal of Educational and Behavioral Statistics

Pub Date : 2020-11-11 DOI: 10.3102/1076998620967726

David M. Quinn, Andrew D. Ho

The estimation of test score “gaps” and gap trends plays an important role in monitoring educational inequality. Researchers decompose gaps and gap changes into within- and between-school portions to generate evidence on the role schools play in shaping these inequalities. However, existing decomposition methods assume an equal-interval test scale and are a poor fit to coarsened data such as proficiency categories. This leaves many potential data sources ill-suited for decomposition applications. We develop two decomposition approaches that overcome these limitations: an extension of V, an ordinal gap statistic, and an extension of ordered probit models. Simulations show V decompositions have negligible bias with small within-school samples. Ordered probit decompositions have negligible bias with large within-school samples but more serious bias with small within-school samples. More broadly, our methods enable analysts to (1) decompose the difference between two groups on any ordinal outcome into portions within- and between some third categorical variable and (2) estimate scale-invariant between-group differences that adjust for a categorical covariate.

对考试成绩“差距”和差距趋势的估计在监测教育不平等方面发挥着重要作用。研究人员将差距和差距变化分解为学校内部和学校之间的部分，以产生关于学校在形成这些不平等方面所起作用的证据。然而，现有的分解方法假设了等间隔测试规模，并且不太适合粗糙化的数据，如熟练程度类别。这使得许多潜在的数据源不适合分解应用程序。我们开发了两种克服这些限制的分解方法：V的扩展、有序间隙统计量和有序probit模型的扩展。模拟结果表明，V分解在学校样本较小的情况下具有可忽略的偏差。有序probit分解对于大的校内样本具有可忽略的偏差，但是对于小的校内样本则具有更严重的偏差。更广泛地说，我们的方法使分析师能够（1）将两组之间在任何有序结果上的差异分解为第三分类变量内和之间的部分，以及（2）估计调整分类协变量的组间差异的尺度不变量。

{"title":"Ordinal Approaches to Decomposing Between-Group Test Score Disparities","authors":"David M. Quinn, Andrew D. Ho","doi":"10.3102/1076998620967726","DOIUrl":"https://doi.org/10.3102/1076998620967726","url":null,"abstract":"The estimation of test score “gaps” and gap trends plays an important role in monitoring educational inequality. Researchers decompose gaps and gap changes into within- and between-school portions to generate evidence on the role schools play in shaping these inequalities. However, existing decomposition methods assume an equal-interval test scale and are a poor fit to coarsened data such as proficiency categories. This leaves many potential data sources ill-suited for decomposition applications. We develop two decomposition approaches that overcome these limitations: an extension of V, an ordinal gap statistic, and an extension of ordered probit models. Simulations show V decompositions have negligible bias with small within-school samples. Ordered probit decompositions have negligible bias with large within-school samples but more serious bias with small within-school samples. More broadly, our methods enable analysts to (1) decompose the difference between two groups on any ordinal outcome into portions within- and between some third categorical variable and (2) estimate scale-invariant between-group differences that adjust for a categorical covariate.","PeriodicalId":48001,"journal":{"name":"Journal of Educational and Behavioral Statistics","volume":"46 1","pages":"466 - 500"},"PeriodicalIF":2.4,"publicationDate":"2020-11-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.3102/1076998620967726","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42874649","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

On the Treatment of Missing Data in Background Questionnaires in Educational Large-Scale Assessments: An Evaluation of Different Procedures 教育大规模评估中背景问卷缺失数据的处理:不同程序的评价

IF 2.4 3区心理学 Q2 EDUCATION & EDUCATIONAL RESEARCH

Journal of Educational and Behavioral Statistics

Pub Date : 2020-10-27 DOI: 10.3102/1076998620959058

S. Grund, O. Lüdtke, A. Robitzsch

Large-scale assessments (LSAs) use Mislevy’s “plausible value” (PV) approach to relate student proficiency to noncognitive variables administered in a background questionnaire. This method requires background variables to be completely observed, a requirement that is seldom fulfilled. In this article, we evaluate and compare the properties of methods used in current practice for dealing with missing data in background variables in educational LSAs, which rely on the missing indicator method (MIM), with other methods based on multiple imputation. In this context, we present a fully conditional specification (FCS) approach that allows for a joint treatment of PVs and missing data. Using theoretical arguments and two simulation studies, we illustrate under what conditions the MIM provides biased or unbiased estimates of population parameters and provide evidence that methods such as FCS can provide an effective alternative to the MIM. We discuss the strengths and weaknesses of the approaches and outline potential consequences for operational practice in educational LSAs. An illustration is provided using data from the PISA 2015 study.

大规模评估（LSA）使用Mislevy的“合理值”（PV）方法将学生的熟练程度与背景问卷中的非认知变量联系起来。这种方法要求完全观察背景变量，而这一要求很少得到满足。在本文中，我们评估并比较了当前实践中用于处理教育LSA背景变量中缺失数据的方法的特性，这些方法依赖于缺失指标法（MIM），而其他方法基于多重插补。在这种情况下，我们提出了一种完全条件规范（FCS）方法，该方法允许对PV和缺失数据进行联合处理。通过理论论证和两项模拟研究，我们说明了MIM在什么条件下提供了对总体参数的有偏或无偏估计，并提供了证据，证明FCS等方法可以为MIM提供有效的替代方案。我们讨论了这些方法的优势和劣势，并概述了在教育LSA中操作实践的潜在后果。使用PISA 2015研究的数据进行了说明。

{"title":"On the Treatment of Missing Data in Background Questionnaires in Educational Large-Scale Assessments: An Evaluation of Different Procedures","authors":"S. Grund, O. Lüdtke, A. Robitzsch","doi":"10.3102/1076998620959058","DOIUrl":"https://doi.org/10.3102/1076998620959058","url":null,"abstract":"Large-scale assessments (LSAs) use Mislevy’s “plausible value” (PV) approach to relate student proficiency to noncognitive variables administered in a background questionnaire. This method requires background variables to be completely observed, a requirement that is seldom fulfilled. In this article, we evaluate and compare the properties of methods used in current practice for dealing with missing data in background variables in educational LSAs, which rely on the missing indicator method (MIM), with other methods based on multiple imputation. In this context, we present a fully conditional specification (FCS) approach that allows for a joint treatment of PVs and missing data. Using theoretical arguments and two simulation studies, we illustrate under what conditions the MIM provides biased or unbiased estimates of population parameters and provide evidence that methods such as FCS can provide an effective alternative to the MIM. We discuss the strengths and weaknesses of the approaches and outline potential consequences for operational practice in educational LSAs. An illustration is provided using data from the PISA 2015 study.","PeriodicalId":48001,"journal":{"name":"Journal of Educational and Behavioral Statistics","volume":"46 1","pages":"430 - 465"},"PeriodicalIF":2.4,"publicationDate":"2020-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49151735","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 14

Block What You Can, Except When You Shouldn’t 阻止你能阻止的，除非你不应该阻止

IF 2.4 3区心理学 Q2 EDUCATION & EDUCATIONAL RESEARCH

Journal of Educational and Behavioral Statistics

Pub Date : 2020-10-26 DOI: 10.3102/10769986211027240

Nicole E. Pashley, Luke W. Miratrix

Several branches of the potential outcome causal inference literature have discussed the merits of blocking versus complete randomization. Some have concluded it can never hurt the precision of estimates, and some have concluded it can hurt. In this article, we reconcile these apparently conflicting views, give a more thorough discussion of what guarantees no harm, and discuss how other aspects of a blocked design can cost, all in terms of estimator precision. We discuss how the different findings are due to different sampling models and assumptions of how the blocks were formed. We also connect these ideas to common misconceptions; for instance, we show that analyzing a blocked experiment as if it were completely randomized, a seemingly conservative method, can actually backfire in some cases. Overall, we find that blocking can have a price but that this price is usually small and the potential for gain can be large. It is hard to go too far wrong with blocking.

潜在结果因果推理文献的几个分支已经讨论了阻断与完全随机化的优点。一些人得出结论，它永远不会损害估计的准确性，而另一些人则得出结论，它可能会造成损害。在本文中，我们调和了这些明显冲突的观点，更深入地讨论了什么保证不会造成伤害，并讨论了阻塞设计的其他方面可能造成的成本，所有这些都是根据估算器的精度进行的。我们讨论了不同的发现是如何由于不同的采样模型和假设是如何形成的块。我们还将这些想法与常见的误解联系起来;例如，我们表明，分析一个封闭的实验，就好像它是完全随机的，一个看似保守的方法，实际上可能在某些情况下适得其反。总的来说，我们发现封锁是有代价的，但这个代价通常很小，而收益的潜力可能很大。阻塞很难犯太大的错误。

引用次数: 7