Item response theory-based continuous test norming.

IF 7.8 1区心理学 Q1 PSYCHOLOGY, MULTIDISCIPLINARY Psychological methods Pub Date : 2024-10-14 DOI:10.1037/met0000686

Hannah M Heister,Casper J Albers,Marie Wiberg,Marieke E Timmerman

{"title":"Item response theory-based continuous test norming.","authors":"Hannah M Heister,Casper J Albers,Marie Wiberg,Marieke E Timmerman","doi":"10.1037/met0000686","DOIUrl":null,"url":null,"abstract":"In norm-referenced psychological testing, an individual's performance is expressed in relation to a reference population using a standardized score, like an intelligence quotient score. The reference population can depend on a continuous variable, like age. Current continuous norming methods transform the raw score into an age-dependent standardized score. Such methods have the shortcoming to solely rely on the raw test scores, ignoring valuable information from individual item responses. Instead of modeling the raw test scores, we propose modeling the item scores with a Bayesian two-parameter logistic (2PL) item response theory model with age-dependent mean and variance of the latent trait distribution, 2PL-norm for short. Norms are then derived using the estimated latent trait score and the age-dependent distribution parameters. Simulations show that 2PL-norms are overall more accurate than those from the most popular raw score-based norming methods cNORM and generalized additive models for location, scale, and shape (GAMLSS). Furthermore, the credible intervals of 2PL-norm exhibit clearly superior coverage over the confidence intervals of the raw score-based methods. The only issue of 2PL-norm is its slightly lower performance at the tails of the norms. Among the raw score-based norming methods, GAMLSS outperforms cNORM. For empirical practice this suggests the use of 2PL-norm, if the model assumptions hold. If not, or the interest is solely in the point estimates of the extreme trait positions, GAMLSS-based norming is a better alternative. The use of the 2PL-norm is illustrated and compared with GAMLSS and cNORM using empirical data, and code is provided, so that users can readily apply 2PL-norm to their normative data. (PsycInfo Database Record (c) 2024 APA, all rights reserved).","PeriodicalId":20782,"journal":{"name":"Psychological methods","volume":"10 1","pages":""},"PeriodicalIF":7.8000,"publicationDate":"2024-10-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Psychological methods","FirstCategoryId":"102","ListUrlMain":"https://doi.org/10.1037/met0000686","RegionNum":1,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"PSYCHOLOGY, MULTIDISCIPLINARY","Score":null,"Total":0}

引用次数: 0

Abstract

In norm-referenced psychological testing, an individual's performance is expressed in relation to a reference population using a standardized score, like an intelligence quotient score. The reference population can depend on a continuous variable, like age. Current continuous norming methods transform the raw score into an age-dependent standardized score. Such methods have the shortcoming to solely rely on the raw test scores, ignoring valuable information from individual item responses. Instead of modeling the raw test scores, we propose modeling the item scores with a Bayesian two-parameter logistic (2PL) item response theory model with age-dependent mean and variance of the latent trait distribution, 2PL-norm for short. Norms are then derived using the estimated latent trait score and the age-dependent distribution parameters. Simulations show that 2PL-norms are overall more accurate than those from the most popular raw score-based norming methods cNORM and generalized additive models for location, scale, and shape (GAMLSS). Furthermore, the credible intervals of 2PL-norm exhibit clearly superior coverage over the confidence intervals of the raw score-based methods. The only issue of 2PL-norm is its slightly lower performance at the tails of the norms. Among the raw score-based norming methods, GAMLSS outperforms cNORM. For empirical practice this suggests the use of 2PL-norm, if the model assumptions hold. If not, or the interest is solely in the point estimates of the extreme trait positions, GAMLSS-based norming is a better alternative. The use of the 2PL-norm is illustrated and compared with GAMLSS and cNORM using empirical data, and code is provided, so that users can readily apply 2PL-norm to their normative data. (PsycInfo Database Record (c) 2024 APA, all rights reserved).

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

以项目反应理论为基础的连续测试规范化。

在常模参照心理测验中，一个人的表现是用一个标准化的分数（如智商分数）来表示与参照人群的关系。参照群体可以是连续变量，如年龄。目前的连续常模方法是将原始分数转换成与年龄相关的标准化分数。这种方法的缺点是只依赖原始测验分数，而忽略了单个项目反应的宝贵信息。我们建议用贝叶斯双参数对数（2PL）项目反应理论模型（简称 2PL-norm）代替原始测验分数建模，该模型的潜在特质分布的均值和方差与年龄相关。然后，利用估计的潜在特质得分和与年龄相关的分布参数得出常模。模拟结果表明，2PL-标准总体上比最流行的基于原始分数的标准方法 cNORM 和位置、尺度和形状的广义加法模型（GAMLSS）更准确。此外，2PL-norm 的可信区间明显优于基于原始分数方法的置信区间。2PL-norm 的唯一问题是其在规范尾部的性能略低。在基于原始分数的规范化方法中，GAMLSS 的表现优于 cNORM。在经验实践中，如果模型假设成立，则建议使用 2PL-norm 方法。如果模型假设不成立，或者只对极端性状位置的点估计感兴趣，那么基于 GAMLSS 的规范化方法是更好的选择。本文使用经验数据对 2PL-norm 的使用进行了说明，并与 GAMLSS 和 cNORM 进行了比较，同时还提供了代码，以便用户可以随时将 2PL-norm 应用于他们的常模数据。(PsycInfo Database Record (c) 2024 APA, 版权所有）。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Psychological methods PSYCHOLOGY, MULTIDISCIPLINARY-

CiteScore

13.10

自引率

7.10%

发文量

159

期刊介绍： Psychological Methods is devoted to the development and dissemination of methods for collecting, analyzing, understanding, and interpreting psychological data. Its purpose is the dissemination of innovations in research design, measurement, methodology, and quantitative and qualitative analysis to the psychological community; its further purpose is to promote effective communication about related substantive and methodological issues. The audience is expected to be diverse and to include those who develop new procedures, those who are responsible for undergraduate and graduate training in design, measurement, and statistics, as well as those who employ those procedures in research.