Calculating Bias in Test Score Equating in a NEAT Design.

IF 1.2 4区心理学 Q4 PSYCHOLOGY, MATHEMATICAL Applied Psychological Measurement Pub Date : 2025-03-24 eCollection Date: 2025-10-01 DOI:10.1177/01466216251330305

Marie Wiberg, Inga Laukaityte

{"title":"Calculating Bias in Test Score Equating in a NEAT Design.","authors":"Marie Wiberg, Inga Laukaityte","doi":"10.1177/01466216251330305","DOIUrl":null,"url":null,"abstract":"<p><p>Test score equating is used to make scores from different test forms comparable, even when groups differ in ability. In practice, the non-equivalent group with anchor test (NEAT) design is commonly used. The overall aim was to compare the amount of bias under different conditions when using either chained equating or frequency estimation with five different criterion functions: the identity function, linear equating, equipercentile, chained equating and frequency estimation. We used real test data from a multiple-choice binary scored college admissions test to illustrate that the choice of criterion function matter. Further, we simulated data in line with the empirical data to examine difference in ability between groups, difference in item difficulty, difference in anchor test form and regular test form length, difference in correlations between anchor test form and regular test forms, and different sample size. The results indicate that how bias is defined heavily affects the conclusions we draw about which equating method is to be preferred in different scenarios. Practical implications of this in standardized tests are given together with recommendations on how to calculate bias when evaluating equating transformations.</p>","PeriodicalId":48300,"journal":{"name":"Applied Psychological Measurement","volume":" ","pages":"350-366"},"PeriodicalIF":1.2000,"publicationDate":"2025-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11948241/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Applied Psychological Measurement","FirstCategoryId":"102","ListUrlMain":"https://doi.org/10.1177/01466216251330305","RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/10/1 0:00:00","PubModel":"eCollection","JCR":"Q4","JCRName":"PSYCHOLOGY, MATHEMATICAL","Score":null,"Total":0}

引用次数: 0

Abstract

Test score equating is used to make scores from different test forms comparable, even when groups differ in ability. In practice, the non-equivalent group with anchor test (NEAT) design is commonly used. The overall aim was to compare the amount of bias under different conditions when using either chained equating or frequency estimation with five different criterion functions: the identity function, linear equating, equipercentile, chained equating and frequency estimation. We used real test data from a multiple-choice binary scored college admissions test to illustrate that the choice of criterion function matter. Further, we simulated data in line with the empirical data to examine difference in ability between groups, difference in item difficulty, difference in anchor test form and regular test form length, difference in correlations between anchor test form and regular test forms, and different sample size. The results indicate that how bias is defined heavily affects the conclusions we draw about which equating method is to be preferred in different scenarios. Practical implications of this in standardized tests are given together with recommendations on how to calculate bias when evaluating equating transformations.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

在NEAT设计中计算考试分数等同中的偏差。

考试成绩相等是用来使不同考试形式的分数具有可比性，即使小组的能力不同。在实践中，通常采用非等效群锚试验（NEAT）设计。总体目标是比较不同条件下的偏差量，当使用链式方程或频率估计与五个不同的标准函数：恒等函数，线性方程，等百分位，链式方程和频率估计。我们使用了大学录取考试中多项选择题的真实测试数据来说明标准函数的选择很重要。进一步，我们根据实证数据模拟数据，检验各组能力差异、项目难度差异、锚试题与常规试题长度差异、锚试题与常规试题相关性差异以及不同样本量。结果表明，如何定义偏差在很大程度上影响了我们得出的结论，即在不同情况下哪种等效方法是首选的。这在标准化测试中的实际影响给出了关于如何计算偏差时，评估等效转换的建议。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Applied Psychological Measurement Multiple-

CiteScore

2.30

自引率

8.30%

发文量

期刊介绍： Applied Psychological Measurement publishes empirical research on the application of techniques of psychological measurement to substantive problems in all areas of psychology and related disciplines.