Robustness of Item Response Theory Models under the PISA Multistage Adaptive Testing Designs

IF 1.6 4区心理学 Q3 PSYCHOLOGY, APPLIED Journal of Educational Measurement Pub Date : 2024-07-31 DOI:10.1111/jedm.12409

Hyo Jeong Shin, Christoph König, Frederic Robin, Andreas Frey, Kentaro Yamamoto

{"title":"Robustness of Item Response Theory Models under the PISA Multistage Adaptive Testing Designs","authors":"Hyo Jeong Shin, Christoph König, Frederic Robin, Andreas Frey, Kentaro Yamamoto","doi":"10.1111/jedm.12409","DOIUrl":null,"url":null,"abstract":"<p>Many international large-scale assessments (ILSAs) have switched to multistage adaptive testing (MST) designs to improve measurement efficiency in measuring the skills of the heterogeneous populations around the world. In this context, previous literature has reported the acceptable level of model parameter recovery under the MST designs when the current item response theory (IRT)-based scaling models are used. However, previous studies have not considered the influence of realistic phenomena commonly observed in ILSA data, such as item-by-country interactions, repeated use of MST designs in subsequent cycles, and nonresponse, including omitted and not-reached items. The purpose of this study is to examine the robustness of current IRT-based scaling models to these three factors under MST designs, using the Programme for International Student Assessment (PISA) designs as an example. A series of simulation studies show that the IRT scaling models used in the PISA are robust to repeated use of the MST design in a subsequent cycle with fewer items and smaller sample sizes, while item-by-country interactions and items not-reached have negligible to modest effects on model parameter estimation, and omitted responses have the largest effect. The discussion section provides recommendations and implications for future MST designs and scaling models for ILSAs.</p>","PeriodicalId":47871,"journal":{"name":"Journal of Educational Measurement","volume":"62 3","pages":"392-414"},"PeriodicalIF":1.6000,"publicationDate":"2024-07-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Educational Measurement","FirstCategoryId":"102","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1111/jedm.12409","RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"PSYCHOLOGY, APPLIED","Score":null,"Total":0}

引用次数: 0

Abstract

Many international large-scale assessments (ILSAs) have switched to multistage adaptive testing (MST) designs to improve measurement efficiency in measuring the skills of the heterogeneous populations around the world. In this context, previous literature has reported the acceptable level of model parameter recovery under the MST designs when the current item response theory (IRT)-based scaling models are used. However, previous studies have not considered the influence of realistic phenomena commonly observed in ILSA data, such as item-by-country interactions, repeated use of MST designs in subsequent cycles, and nonresponse, including omitted and not-reached items. The purpose of this study is to examine the robustness of current IRT-based scaling models to these three factors under MST designs, using the Programme for International Student Assessment (PISA) designs as an example. A series of simulation studies show that the IRT scaling models used in the PISA are robust to repeated use of the MST design in a subsequent cycle with fewer items and smaller sample sizes, while item-by-country interactions and items not-reached have negligible to modest effects on model parameter estimation, and omitted responses have the largest effect. The discussion section provides recommendations and implications for future MST designs and scaling models for ILSAs.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

国际学生评估项目多阶段适应性测试设计下项目反应理论模型的稳健性

许多国际性的大规模测评（ILSA）都改用了多阶段适应性测试（MST）设计，以提高测量效率，测量全球异质人群的技能。在这种情况下，以往的文献报道了在多阶段自适应测试设计下，当使用目前基于项目反应理论（IRT）的比例模型时，模型参数恢复的可接受程度。然而，以往的研究并未考虑 ILSA 数据中常见的现实现象的影响，如项目与国家之间的交互作用、在后续周期中重复使用 MST 设计以及非响应（包括遗漏和未达到的项目）。本研究的目的是以国际学生评估项目（PISA）设计为例，检验目前基于 IRT 的缩放模型在 MST 设计下对这三个因素的稳健性。一系列模拟研究表明，在 PISA 项目中使用的 IRT 计分模型在后续周期中重复使用 MST 设计（项目数量更少、样本量更小）时是稳健的，而项目与国家之间的交互作用和未达到的项目对模型参数估计的影响可以忽略不计，甚至微乎其微，而遗漏回答的影响最大。讨论部分为未来的 MST 设计和 ILSA 的比例模型提供了建议和启示。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Journal of Educational Measurement Multiple-

CiteScore

2.30

自引率

7.70%

发文量

期刊介绍： The Journal of Educational Measurement (JEM) publishes original measurement research, provides reviews of measurement publications, and reports on innovative measurement applications. The topics addressed will interest those concerned with the practice of measurement in field settings, as well as be of interest to measurement theorists. In addition to presenting new contributions to measurement theory and practice, JEM also serves as a vehicle for improving educational measurement applications in a variety of settings.

期刊最新文献

Issue Information Sense of Belonging in Science: A Focus on the Construct Incorporating Measurement Errors in Fixed Person Parameter Calibration Evaluating General-Purpose Multimodal AI for Q-Matrix Generation from Math Items: A Cognitive Diagnostic Modeling Exploration AI and Measurement Concerns: Dealing with Imbalanced Data in Autoscoring