IRT Observed-Score Equating for Rater-Mediated Assessments Using a Hierarchical Rater Model

IF 1.6 4区 心理学 Q3 PSYCHOLOGY, APPLIED Journal of Educational Measurement Pub Date : 2025-01-13 DOI:10.1111/jedm.12425
Tong Wu, Stella Y. Kim, Carl Westine, Michelle Boyer
{"title":"IRT Observed-Score Equating for Rater-Mediated Assessments Using a Hierarchical Rater Model","authors":"Tong Wu,&nbsp;Stella Y. Kim,&nbsp;Carl Westine,&nbsp;Michelle Boyer","doi":"10.1111/jedm.12425","DOIUrl":null,"url":null,"abstract":"<p>While significant attention has been given to test equating to ensure score comparability, limited research has explored equating methods for rater-mediated assessments, where human raters inherently introduce error. If not properly addressed, these errors can undermine score interchangeability and test validity. This study proposes an equating method that accounts for rater errors by utilizing item response theory (IRT) observed-score equating with a hierarchical rater model (HRM). Its effectiveness is compared to an IRT observed-score equating method using the generalized partial credit model across 16 rater combinations with varying levels of rater bias and variability. The results indicate that equating performance depends on the interaction between rater bias and variability across forms. Both the proposed and traditional methods demonstrated robustness in terms of bias and RMSE when rater bias and variability were similar between forms, with a few exceptions. However, when rater errors varied significantly across forms, the proposed method consistently produced more stable equating results. Differences in standard error between the methods were minimal under most conditions.</p>","PeriodicalId":47871,"journal":{"name":"Journal of Educational Measurement","volume":"62 1","pages":"145-171"},"PeriodicalIF":1.6000,"publicationDate":"2025-01-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Educational Measurement","FirstCategoryId":"102","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1111/jedm.12425","RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"PSYCHOLOGY, APPLIED","Score":null,"Total":0}
引用次数: 0

Abstract

While significant attention has been given to test equating to ensure score comparability, limited research has explored equating methods for rater-mediated assessments, where human raters inherently introduce error. If not properly addressed, these errors can undermine score interchangeability and test validity. This study proposes an equating method that accounts for rater errors by utilizing item response theory (IRT) observed-score equating with a hierarchical rater model (HRM). Its effectiveness is compared to an IRT observed-score equating method using the generalized partial credit model across 16 rater combinations with varying levels of rater bias and variability. The results indicate that equating performance depends on the interaction between rater bias and variability across forms. Both the proposed and traditional methods demonstrated robustness in terms of bias and RMSE when rater bias and variability were similar between forms, with a few exceptions. However, when rater errors varied significantly across forms, the proposed method consistently produced more stable equating results. Differences in standard error between the methods were minimal under most conditions.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
使用分层评分模型对评分介导的评估进行IRT观察得分相等
虽然相当重视测试相等以确保分数的可比性,但有限的研究探索了评分中介评估的相等方法,其中人为评分者固有地引入错误。如果处理不当,这些错误会破坏分数的互换性和测试的有效性。本研究提出了一种等价化方法,利用项目反应理论(IRT)观察得分等价化与分层评分模型(HRM)来解释评分者误差。将其有效性与使用广义部分信用模型的IRT观察得分等同方法进行比较,该方法跨16种具有不同程度的评分偏差和可变性的评分组合。结果表明,相等的性能取决于评分偏差和不同形式的可变性之间的相互作用。除了少数例外,当不同形式之间的偏差和可变性相似时,所提出的方法和传统方法在偏差和RMSE方面都表现出鲁棒性。然而,当不同形式的评分误差显著变化时,所提出的方法始终产生更稳定的相等结果。在大多数条件下,两种方法的标准误差差异极小。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
CiteScore
2.30
自引率
7.70%
发文量
46
期刊介绍: The Journal of Educational Measurement (JEM) publishes original measurement research, provides reviews of measurement publications, and reports on innovative measurement applications. The topics addressed will interest those concerned with the practice of measurement in field settings, as well as be of interest to measurement theorists. In addition to presenting new contributions to measurement theory and practice, JEM also serves as a vehicle for improving educational measurement applications in a variety of settings.
期刊最新文献
Incorporating Measurement Errors in Fixed Person Parameter Calibration Evaluating General-Purpose Multimodal AI for Q-Matrix Generation from Math Items: A Cognitive Diagnostic Modeling Exploration AI and Measurement Concerns: Dealing with Imbalanced Data in Autoscoring Correction to “Using GPT-4 to Augment Imbalanced Data for Automatic Scoring” Issue Information
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1