Using Linkage Sets to Improve Connectedness in Rater Response Model Estimation

IF 1.4 4区 心理学 Q3 PSYCHOLOGY, APPLIED Journal of Educational Measurement Pub Date : 2023-02-19 DOI:10.1111/jedm.12360
Jodi M. Casabianca, John R. Donoghue, Hyo Jeong Shin, Szu-Fu Chao, Ikkyu Choi
{"title":"Using Linkage Sets to Improve Connectedness in Rater Response Model Estimation","authors":"Jodi M. Casabianca,&nbsp;John R. Donoghue,&nbsp;Hyo Jeong Shin,&nbsp;Szu-Fu Chao,&nbsp;Ikkyu Choi","doi":"10.1111/jedm.12360","DOIUrl":null,"url":null,"abstract":"<p>Using item-response theory to model rater effects provides an alternative solution for rater monitoring and diagnosis, compared to using standard performance metrics. In order to fit such models, the ratings data must be sufficiently connected in order to estimate rater effects. Due to popular rating designs used in large-scale testing scenarios, there tends to be a large proportion of missing data, yielding sparse matrices and estimation issues. In this article, we explore the impact of different types of connectedness, or linkage, brought about by using a linkage set—a collection of responses scored by most or all raters. We also explore the impact of the properties and composition of the linkage set, the different connectedness yielded from different rating designs, and the role of scores from automated scoring engines. In designing monitoring systems using the rater response version of the generalized partial credit model, the study results suggest use of a linkage set, especially a large one that is comprised of responses representing the full score scale. Results also show that a double-human-scoring design provides more connectedness than a design with one human and an automated scoring engine. Furthermore, scores from automated scoring engines do not provide adequate connectedness. We discuss considerations for operational implementation and further study.</p>","PeriodicalId":47871,"journal":{"name":"Journal of Educational Measurement","volume":"60 3","pages":"428-454"},"PeriodicalIF":1.4000,"publicationDate":"2023-02-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Educational Measurement","FirstCategoryId":"102","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1111/jedm.12360","RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"PSYCHOLOGY, APPLIED","Score":null,"Total":0}
引用次数: 1

Abstract

Using item-response theory to model rater effects provides an alternative solution for rater monitoring and diagnosis, compared to using standard performance metrics. In order to fit such models, the ratings data must be sufficiently connected in order to estimate rater effects. Due to popular rating designs used in large-scale testing scenarios, there tends to be a large proportion of missing data, yielding sparse matrices and estimation issues. In this article, we explore the impact of different types of connectedness, or linkage, brought about by using a linkage set—a collection of responses scored by most or all raters. We also explore the impact of the properties and composition of the linkage set, the different connectedness yielded from different rating designs, and the role of scores from automated scoring engines. In designing monitoring systems using the rater response version of the generalized partial credit model, the study results suggest use of a linkage set, especially a large one that is comprised of responses representing the full score scale. Results also show that a double-human-scoring design provides more connectedness than a design with one human and an automated scoring engine. Furthermore, scores from automated scoring engines do not provide adequate connectedness. We discuss considerations for operational implementation and further study.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
利用链接集提高评价响应模型估计中的连通性
与使用标准绩效指标相比,使用项目反应理论来模拟评分者效应为评分者监测和诊断提供了另一种解决方案。为了拟合这样的模型,评级数据必须充分连接,以便估计评级效应。由于在大规模测试场景中使用的流行评级设计,往往存在很大比例的缺失数据,从而产生稀疏矩阵和估计问题。在本文中,我们探讨了不同类型的连通性或联系的影响,通过使用联系集(由大多数或所有评分者评分的回答集合)带来的影响。我们还探讨了链接集的属性和组成的影响,不同评级设计产生的不同连通性,以及自动评分引擎得分的作用。在使用广义部分信用模型的评分反应版本设计监测系统时,研究结果建议使用链接集,特别是由代表满分量表的反应组成的大型链接集。结果还表明,与一个人和自动评分引擎的设计相比,双人评分设计提供了更多的连通性。此外,来自自动评分引擎的分数不能提供足够的连接性。我们讨论了操作实施和进一步研究的考虑。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
CiteScore
2.30
自引率
7.70%
发文量
46
期刊介绍: The Journal of Educational Measurement (JEM) publishes original measurement research, provides reviews of measurement publications, and reports on innovative measurement applications. The topics addressed will interest those concerned with the practice of measurement in field settings, as well as be of interest to measurement theorists. In addition to presenting new contributions to measurement theory and practice, JEM also serves as a vehicle for improving educational measurement applications in a variety of settings.
期刊最新文献
Sequential Reservoir Computing for Log File‐Based Behavior Process Data Analyses Issue Information Exploring Latent Constructs through Multimodal Data Analysis Robustness of Item Response Theory Models under the PISA Multistage Adaptive Testing Designs Modeling Nonlinear Effects of Person‐by‐Item Covariates in Explanatory Item Response Models: Exploratory Plots and Modeling Using Smooth Functions
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1