Practical Considerations in Item Calibration With Small Samples Under Multistage Test Design: A Case Study

Q3 Social Sciences ETS Research Report Series Pub Date : 2024-02-04 DOI:10.1002/ets2.12376
Hongwen Guo, Matthew S. Johnson, Daniel F. McCaffrey, Lixong Gu
{"title":"Practical Considerations in Item Calibration With Small Samples Under Multistage Test Design: A Case Study","authors":"Hongwen Guo, Matthew S. Johnson, Daniel F. McCaffrey, Lixong Gu","doi":"10.1002/ets2.12376","DOIUrl":null,"url":null,"abstract":"The multistage testing (MST) design has been gaining attention and popularity in educational assessments. For testing programs that have small test‐taker samples, it is challenging to calibrate new items to replenish the item pool. In the current research, we used the item pools from an operational MST program to illustrate how research studies can be built upon literature and program‐specific data to help to fill the gaps between research and practice and to make sound psychometric decisions to address the small‐sample issues. The studies included choice of item calibration methods, data collection designs to increase sample sizes, and item response theory models in producing the score conversion tables. Our results showed that, with small samples, the fixed parameter calibration (FIPC) method performed consistently the best for calibrating new items, compared to the traditional separate‐calibration with scaling method and a new approach of a calibration method based on the minimum discriminant information adjustment. In addition, the concurrent FIPC calibration with data from multiple administrations also improved parameter estimation of new items. However, because of the program‐specific settings, a simpler model may not improve current practice when the sample size was small and when the initial item pools were well‐calibrated using a two‐parameter logistic model with a large field trial data.","PeriodicalId":11972,"journal":{"name":"ETS Research Report Series","volume":"2011 22","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-02-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ETS Research Report Series","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1002/ets2.12376","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"Social Sciences","Score":null,"Total":0}
引用次数: 0

Abstract

The multistage testing (MST) design has been gaining attention and popularity in educational assessments. For testing programs that have small test‐taker samples, it is challenging to calibrate new items to replenish the item pool. In the current research, we used the item pools from an operational MST program to illustrate how research studies can be built upon literature and program‐specific data to help to fill the gaps between research and practice and to make sound psychometric decisions to address the small‐sample issues. The studies included choice of item calibration methods, data collection designs to increase sample sizes, and item response theory models in producing the score conversion tables. Our results showed that, with small samples, the fixed parameter calibration (FIPC) method performed consistently the best for calibrating new items, compared to the traditional separate‐calibration with scaling method and a new approach of a calibration method based on the minimum discriminant information adjustment. In addition, the concurrent FIPC calibration with data from multiple administrations also improved parameter estimation of new items. However, because of the program‐specific settings, a simpler model may not improve current practice when the sample size was small and when the initial item pools were well‐calibrated using a two‐parameter logistic model with a large field trial data.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
多阶段测试设计下小样本项目校准的实际考虑因素:案例研究
多阶段测试(MST)设计在教育评估中越来越受到关注和欢迎。对于考生样本较少的测试项目来说,校准新项目以补充项目库是一项挑战。在目前的研究中,我们使用了一个运行中的 MST 项目的项目库,以说明如何在文献和特定项目数据的基础上开展研究,帮助填补研究与实践之间的空白,并做出合理的心理测量决策,以解决小样本问题。这些研究包括项目校准方法的选择、增加样本量的数据收集设计以及制作分数转换表的项目反应理论模型。研究结果表明,在小样本情况下,固定参数校准法(FIPC)在校准新项目方面一直表现最佳,与之相比,传统的分别校准加比例法和基于最小判别信息调整的新校准法表现更佳。此外,利用多次施测数据同时进行的 FIPC 校正也改进了新项目的参数估计。然而,由于项目的具体设置,当样本量较小,且初始项目库已通过使用双参数逻辑模型和大量现场试验数据进行了良好校准时,更简单的模型可能无法改善目前的做法。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
ETS Research Report Series
ETS Research Report Series Social Sciences-Education
CiteScore
1.20
自引率
0.00%
发文量
17
期刊最新文献
Building a Validity Argument for the TOEFL Junior® Tests Validity, Reliability, and Fairness Evidence for the JD‐Next Exam Practical Considerations in Item Calibration With Small Samples Under Multistage Test Design: A Case Study Practical Considerations in Item Calibration With Small Samples Under Multistage Test Design: A Case Study Modeling Writing Traits in a Formative Essay Corpus
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1