Robustness of Item Response Theory Models under the PISA Multistage Adaptive Testing Designs

IF 1.4 4区 心理学 Q3 PSYCHOLOGY, APPLIED Journal of Educational Measurement Pub Date : 2024-08-01 DOI:10.1111/jedm.12409
Hyo Jeong Shin, Christoph König, Frederic Robin, Andreas Frey, Kentaro Yamamoto
{"title":"Robustness of Item Response Theory Models under the PISA Multistage Adaptive Testing Designs","authors":"Hyo Jeong Shin, Christoph König, Frederic Robin, Andreas Frey, Kentaro Yamamoto","doi":"10.1111/jedm.12409","DOIUrl":null,"url":null,"abstract":"Many international large‐scale assessments (ILSAs) have switched to multistage adaptive testing (MST) designs to improve measurement efficiency in measuring the skills of the heterogeneous populations around the world. In this context, previous literature has reported the acceptable level of model parameter recovery under the MST designs when the current item response theory (IRT)‐based scaling models are used. However, previous studies have not considered the influence of realistic phenomena commonly observed in ILSA data, such as item‐by‐country interactions, repeated use of MST designs in subsequent cycles, and nonresponse, including omitted and not‐reached items. The purpose of this study is to examine the robustness of current IRT‐based scaling models to these three factors under MST designs, using the Programme for International Student Assessment (PISA) designs as an example. A series of simulation studies show that the IRT scaling models used in the PISA are robust to repeated use of the MST design in a subsequent cycle with fewer items and smaller sample sizes, while item‐by‐country interactions and items not‐reached have negligible to modest effects on model parameter estimation, and omitted responses have the largest effect. The discussion section provides recommendations and implications for future MST designs and scaling models for ILSAs.","PeriodicalId":47871,"journal":{"name":"Journal of Educational Measurement","volume":null,"pages":null},"PeriodicalIF":1.4000,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Educational Measurement","FirstCategoryId":"102","ListUrlMain":"https://doi.org/10.1111/jedm.12409","RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"PSYCHOLOGY, APPLIED","Score":null,"Total":0}
引用次数: 0

Abstract

Many international large‐scale assessments (ILSAs) have switched to multistage adaptive testing (MST) designs to improve measurement efficiency in measuring the skills of the heterogeneous populations around the world. In this context, previous literature has reported the acceptable level of model parameter recovery under the MST designs when the current item response theory (IRT)‐based scaling models are used. However, previous studies have not considered the influence of realistic phenomena commonly observed in ILSA data, such as item‐by‐country interactions, repeated use of MST designs in subsequent cycles, and nonresponse, including omitted and not‐reached items. The purpose of this study is to examine the robustness of current IRT‐based scaling models to these three factors under MST designs, using the Programme for International Student Assessment (PISA) designs as an example. A series of simulation studies show that the IRT scaling models used in the PISA are robust to repeated use of the MST design in a subsequent cycle with fewer items and smaller sample sizes, while item‐by‐country interactions and items not‐reached have negligible to modest effects on model parameter estimation, and omitted responses have the largest effect. The discussion section provides recommendations and implications for future MST designs and scaling models for ILSAs.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
国际学生评估项目多阶段适应性测试设计下项目反应理论模型的稳健性
许多国际性的大规模测评(ILSA)都改用了多阶段适应性测试(MST)设计,以提高测量效率,测量全球异质人群的技能。在这种情况下,以往的文献报道了在多阶段自适应测试设计下,当使用目前基于项目反应理论(IRT)的比例模型时,模型参数恢复的可接受程度。然而,以往的研究并未考虑 ILSA 数据中常见的现实现象的影响,如项目与国家之间的交互作用、在后续周期中重复使用 MST 设计以及非响应(包括遗漏和未达到的项目)。本研究的目的是以国际学生评估项目(PISA)设计为例,检验目前基于 IRT 的缩放模型在 MST 设计下对这三个因素的稳健性。一系列模拟研究表明,在 PISA 项目中使用的 IRT 计分模型在后续周期中重复使用 MST 设计(项目数量更少、样本量更小)时是稳健的,而项目与国家之间的交互作用和未达到的项目对模型参数估计的影响可以忽略不计,甚至微乎其微,而遗漏回答的影响最大。讨论部分为未来的 MST 设计和 ILSA 的比例模型提供了建议和启示。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
CiteScore
2.30
自引率
7.70%
发文量
46
期刊介绍: The Journal of Educational Measurement (JEM) publishes original measurement research, provides reviews of measurement publications, and reports on innovative measurement applications. The topics addressed will interest those concerned with the practice of measurement in field settings, as well as be of interest to measurement theorists. In addition to presenting new contributions to measurement theory and practice, JEM also serves as a vehicle for improving educational measurement applications in a variety of settings.
期刊最新文献
Sequential Reservoir Computing for Log File‐Based Behavior Process Data Analyses Issue Information Exploring Latent Constructs through Multimodal Data Analysis Robustness of Item Response Theory Models under the PISA Multistage Adaptive Testing Designs Modeling Nonlinear Effects of Person‐by‐Item Covariates in Explanatory Item Response Models: Exploratory Plots and Modeling Using Smooth Functions
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1