项目反应理论核等价在混合格式测试中的有效性分析

IF 1 4区心理学 Q4 PSYCHOLOGY, MATHEMATICAL Applied Psychological Measurement Pub Date : 2023-10-19 DOI:10.1177/01466216231209757

Joakim Wallmark, Maria Josefsson, Marie Wiberg

{"title":"项目反应理论核等价在混合格式测试中的有效性分析","authors":"Joakim Wallmark, Maria Josefsson, Marie Wiberg","doi":"10.1177/01466216231209757","DOIUrl":null,"url":null,"abstract":"This study aims to evaluate the performance of Item Response Theory (IRT) kernel equating in the context of mixed-format tests by comparing it to IRT observed score equating and kernel equating with log-linear presmoothing. Comparisons were made through both simulations and real data applications, under both equivalent groups (EG) and non-equivalent groups with anchor test (NEAT) sampling designs. To prevent bias towards IRT methods, data were simulated with and without the use of IRT models. The results suggest that the difference between IRT kernel equating and IRT observed score equating is minimal, both in terms of the equated scores and their standard errors. The application of IRT models for presmoothing yielded smaller standard error of equating than the log-linear presmoothing approach. When test data were generated using IRT models, IRT-based methods proved less biased than log-linear kernel equating. However, when data were simulated without IRT models, log-linear kernel equating showed less bias. Overall, IRT kernel equating shows great promise when equating mixed-format tests.","PeriodicalId":48300,"journal":{"name":"Applied Psychological Measurement","volume":"10 1","pages":"0"},"PeriodicalIF":1.0000,"publicationDate":"2023-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Efficiency Analysis of Item Response Theory Kernel Equating for Mixed-Format Tests\",\"authors\":\"Joakim Wallmark, Maria Josefsson, Marie Wiberg\",\"doi\":\"10.1177/01466216231209757\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This study aims to evaluate the performance of Item Response Theory (IRT) kernel equating in the context of mixed-format tests by comparing it to IRT observed score equating and kernel equating with log-linear presmoothing. Comparisons were made through both simulations and real data applications, under both equivalent groups (EG) and non-equivalent groups with anchor test (NEAT) sampling designs. To prevent bias towards IRT methods, data were simulated with and without the use of IRT models. The results suggest that the difference between IRT kernel equating and IRT observed score equating is minimal, both in terms of the equated scores and their standard errors. The application of IRT models for presmoothing yielded smaller standard error of equating than the log-linear presmoothing approach. When test data were generated using IRT models, IRT-based methods proved less biased than log-linear kernel equating. However, when data were simulated without IRT models, log-linear kernel equating showed less bias. Overall, IRT kernel equating shows great promise when equating mixed-format tests.\",\"PeriodicalId\":48300,\"journal\":{\"name\":\"Applied Psychological Measurement\",\"volume\":\"10 1\",\"pages\":\"0\"},\"PeriodicalIF\":1.0000,\"publicationDate\":\"2023-10-19\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Applied Psychological Measurement\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1177/01466216231209757\",\"RegionNum\":4,\"RegionCategory\":\"心理学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"PSYCHOLOGY, MATHEMATICAL\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Applied Psychological Measurement","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1177/01466216231209757","RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"PSYCHOLOGY, MATHEMATICAL","Score":null,"Total":0}

引用次数: 0

摘要

本研究旨在通过将项目反应理论(IRT)核等价与IRT观察得分等价和对数线性预平滑核等价进行比较，评价项目反应理论核等价在混合格式测试中的表现。通过模拟和实际数据应用，在锚点试验(NEAT)抽样设计的等效组(EG)和非等效组(non-equivalent groups)下进行了比较。为了防止对IRT方法的偏见，在使用和不使用IRT模型的情况下对数据进行了模拟。结果表明，IRT内核相等和IRT观察到的分数相等之间的差异是最小的，无论是在相等的分数和它们的标准误差方面。应用IRT模型进行预平滑比采用对数线性预平滑方法得到更小的方程标准误差。当使用IRT模型生成测试数据时，基于IRT的方法被证明比对数线性核方程的偏差更小。然而，当没有IRT模型的数据模拟时，对数线性核方程显示出较小的偏差。总的来说，IRT内核等价在等价混合格式测试时显示了很大的希望。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Efficiency Analysis of Item Response Theory Kernel Equating for Mixed-Format Tests

This study aims to evaluate the performance of Item Response Theory (IRT) kernel equating in the context of mixed-format tests by comparing it to IRT observed score equating and kernel equating with log-linear presmoothing. Comparisons were made through both simulations and real data applications, under both equivalent groups (EG) and non-equivalent groups with anchor test (NEAT) sampling designs. To prevent bias towards IRT methods, data were simulated with and without the use of IRT models. The results suggest that the difference between IRT kernel equating and IRT observed score equating is minimal, both in terms of the equated scores and their standard errors. The application of IRT models for presmoothing yielded smaller standard error of equating than the log-linear presmoothing approach. When test data were generated using IRT models, IRT-based methods proved less biased than log-linear kernel equating. However, when data were simulated without IRT models, log-linear kernel equating showed less bias. Overall, IRT kernel equating shows great promise when equating mixed-format tests.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Applied Psychological Measurement Multiple-

CiteScore

2.30

自引率

8.30%

发文量

期刊介绍： Applied Psychological Measurement publishes empirical research on the application of techniques of psychological measurement to substantive problems in all areas of psychology and related disciplines.