评价锚项目选择中的数量-质量权衡:一种垂直尺度方法

Q2 Social Sciences Practical Assessment, Research and Evaluation Pub Date : 2011-04-01 DOI:10.7275/NNCY-EW26
Florian Pibal, H. Cesnik
{"title":"评价锚项目选择中的数量-质量权衡:一种垂直尺度方法","authors":"Florian Pibal, H. Cesnik","doi":"10.7275/NNCY-EW26","DOIUrl":null,"url":null,"abstract":"When administering tests across grades, vertical scaling is often employed to place scores from different tests on a common overall scale so that test-takers’ progress can be tracked. In order to be able to link the results across grades, however, common items are needed that are included in both test forms. In the literature there seems to be no clear agreement about the ideal number of common items. In line with some scholars, we argue that a greater number of anchor items bear a higher risk of unwanted effects like displacement, item drift, or undesired fit statistics and that having fewer psychometrically well-functioning anchor items can sometimes be more desirable. In order to demonstrate this, a study was conducted that included the administration of a reading-comprehension test to 1,350 test-takers across grades 6 to 8. In employing a step-by-step approach, we found that the paradox of high item drift in test administrations across grades can be mitigated and eventually even be eliminated. At the same time, a positive side effect was an increase in the explanatory power of the empirical data. Moreover, it was found that scaling adjustment can be used to evaluate the effectiveness of a vertical scaling approach and, in certain cases, can lead to more accurate results than the use of calibrated anchor items.","PeriodicalId":20361,"journal":{"name":"Practical Assessment, Research and Evaluation","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2011-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"Evaluating the Quantity-Quality Trade-off in the Selection of Anchor Items: a Vertical Scaling Approach\",\"authors\":\"Florian Pibal, H. Cesnik\",\"doi\":\"10.7275/NNCY-EW26\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"When administering tests across grades, vertical scaling is often employed to place scores from different tests on a common overall scale so that test-takers’ progress can be tracked. In order to be able to link the results across grades, however, common items are needed that are included in both test forms. In the literature there seems to be no clear agreement about the ideal number of common items. In line with some scholars, we argue that a greater number of anchor items bear a higher risk of unwanted effects like displacement, item drift, or undesired fit statistics and that having fewer psychometrically well-functioning anchor items can sometimes be more desirable. In order to demonstrate this, a study was conducted that included the administration of a reading-comprehension test to 1,350 test-takers across grades 6 to 8. In employing a step-by-step approach, we found that the paradox of high item drift in test administrations across grades can be mitigated and eventually even be eliminated. At the same time, a positive side effect was an increase in the explanatory power of the empirical data. Moreover, it was found that scaling adjustment can be used to evaluate the effectiveness of a vertical scaling approach and, in certain cases, can lead to more accurate results than the use of calibrated anchor items.\",\"PeriodicalId\":20361,\"journal\":{\"name\":\"Practical Assessment, Research and Evaluation\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2011-04-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Practical Assessment, Research and Evaluation\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.7275/NNCY-EW26\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"Social Sciences\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Practical Assessment, Research and Evaluation","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.7275/NNCY-EW26","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"Social Sciences","Score":null,"Total":0}
引用次数: 4

摘要

在管理跨年级的考试时,通常采用垂直缩放法将不同考试的分数放在一个共同的总体尺度上,以便跟踪考生的进步。然而,为了能够将不同年级的结果联系起来,需要在两种测试表格中包含共同的项目。在文献中,对于常见物品的理想数量似乎没有明确的共识。与一些学者一致,我们认为更多的锚项目承担更高的风险,如位移,项目漂移或不希望的拟合统计,并且拥有更少的心理测量功能良好的锚项目有时可能更可取。为了证明这一点,进行了一项研究,包括对6至8年级的1350名考生进行阅读理解测试。在采用循序渐进的方法时,我们发现考试管理中跨年级高项目漂移的悖论可以得到缓解,甚至最终被消除。同时,积极的副作用是增加了经验数据的解释力。此外,研究发现,尺度调整可用于评估垂直尺度方法的有效性,在某些情况下,可以比使用校准的锚定项目产生更准确的结果。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Evaluating the Quantity-Quality Trade-off in the Selection of Anchor Items: a Vertical Scaling Approach
When administering tests across grades, vertical scaling is often employed to place scores from different tests on a common overall scale so that test-takers’ progress can be tracked. In order to be able to link the results across grades, however, common items are needed that are included in both test forms. In the literature there seems to be no clear agreement about the ideal number of common items. In line with some scholars, we argue that a greater number of anchor items bear a higher risk of unwanted effects like displacement, item drift, or undesired fit statistics and that having fewer psychometrically well-functioning anchor items can sometimes be more desirable. In order to demonstrate this, a study was conducted that included the administration of a reading-comprehension test to 1,350 test-takers across grades 6 to 8. In employing a step-by-step approach, we found that the paradox of high item drift in test administrations across grades can be mitigated and eventually even be eliminated. At the same time, a positive side effect was an increase in the explanatory power of the empirical data. Moreover, it was found that scaling adjustment can be used to evaluate the effectiveness of a vertical scaling approach and, in certain cases, can lead to more accurate results than the use of calibrated anchor items.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
CiteScore
2.60
自引率
0.00%
发文量
0
期刊最新文献
Feedback is a gift: Do Video-enhanced rubrics result in providing better peer feedback than textual rubrics? Do Loss Aversion and the Ownership Effect Bias Content Validation Procedures Flipping the Feedback: Formative Assessment in a Flipped Freshman Circuits Class Eight issues to consider when developing animated videos for the assessment of complex constructs Variability In The Accuracy Of Self-Assessments Among Low, Moderate, And High Performing Students In University Education
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1