Calculating Bias in Test Score Equating in a NEAT Design.

IF 1.2 4区 心理学 Q4 PSYCHOLOGY, MATHEMATICAL Applied Psychological Measurement Pub Date : 2025-03-24 eCollection Date: 2025-10-01 DOI:10.1177/01466216251330305
Marie Wiberg, Inga Laukaityte
{"title":"Calculating Bias in Test Score Equating in a NEAT Design.","authors":"Marie Wiberg, Inga Laukaityte","doi":"10.1177/01466216251330305","DOIUrl":null,"url":null,"abstract":"<p><p>Test score equating is used to make scores from different test forms comparable, even when groups differ in ability. In practice, the non-equivalent group with anchor test (NEAT) design is commonly used. The overall aim was to compare the amount of bias under different conditions when using either chained equating or frequency estimation with five different criterion functions: the identity function, linear equating, equipercentile, chained equating and frequency estimation. We used real test data from a multiple-choice binary scored college admissions test to illustrate that the choice of criterion function matter. Further, we simulated data in line with the empirical data to examine difference in ability between groups, difference in item difficulty, difference in anchor test form and regular test form length, difference in correlations between anchor test form and regular test forms, and different sample size. The results indicate that how bias is defined heavily affects the conclusions we draw about which equating method is to be preferred in different scenarios. Practical implications of this in standardized tests are given together with recommendations on how to calculate bias when evaluating equating transformations.</p>","PeriodicalId":48300,"journal":{"name":"Applied Psychological Measurement","volume":" ","pages":"350-366"},"PeriodicalIF":1.2000,"publicationDate":"2025-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11948241/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Applied Psychological Measurement","FirstCategoryId":"102","ListUrlMain":"https://doi.org/10.1177/01466216251330305","RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/10/1 0:00:00","PubModel":"eCollection","JCR":"Q4","JCRName":"PSYCHOLOGY, MATHEMATICAL","Score":null,"Total":0}
引用次数: 0

Abstract

Test score equating is used to make scores from different test forms comparable, even when groups differ in ability. In practice, the non-equivalent group with anchor test (NEAT) design is commonly used. The overall aim was to compare the amount of bias under different conditions when using either chained equating or frequency estimation with five different criterion functions: the identity function, linear equating, equipercentile, chained equating and frequency estimation. We used real test data from a multiple-choice binary scored college admissions test to illustrate that the choice of criterion function matter. Further, we simulated data in line with the empirical data to examine difference in ability between groups, difference in item difficulty, difference in anchor test form and regular test form length, difference in correlations between anchor test form and regular test forms, and different sample size. The results indicate that how bias is defined heavily affects the conclusions we draw about which equating method is to be preferred in different scenarios. Practical implications of this in standardized tests are given together with recommendations on how to calculate bias when evaluating equating transformations.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
在NEAT设计中计算考试分数等同中的偏差。
考试成绩相等是用来使不同考试形式的分数具有可比性,即使小组的能力不同。在实践中,通常采用非等效群锚试验(NEAT)设计。总体目标是比较不同条件下的偏差量,当使用链式方程或频率估计与五个不同的标准函数:恒等函数,线性方程,等百分位,链式方程和频率估计。我们使用了大学录取考试中多项选择题的真实测试数据来说明标准函数的选择很重要。进一步,我们根据实证数据模拟数据,检验各组能力差异、项目难度差异、锚试题与常规试题长度差异、锚试题与常规试题相关性差异以及不同样本量。结果表明,如何定义偏差在很大程度上影响了我们得出的结论,即在不同情况下哪种等效方法是首选的。这在标准化测试中的实际影响给出了关于如何计算偏差时,评估等效转换的建议。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
CiteScore
2.30
自引率
8.30%
发文量
50
期刊介绍: Applied Psychological Measurement publishes empirical research on the application of techniques of psychological measurement to substantive problems in all areas of psychology and related disciplines.
期刊最新文献
Rise of the Machine: Detecting Aberrant Response Patterns in Survey Instruments Using Autoencoder. The Impact of Latent Density Misspecification on Item Response Theory Equating Methods. Score-Based Tests With Fixed Effects Person Parameters in Item Response Theory: Detecting Model Misspecification Including Differential Item Functioning. Optimal Item Calibration in the Context of the Swedish Scholastic Aptitude Test. Influence of Uninformative Prior Distributions for MCMC Method on Estimating Variance Components in Generalizability Theory.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1