更好的数据从更好的测量使用计算机自适应测试

D. Weiss
{"title":"更好的数据从更好的测量使用计算机自适应测试","authors":"D. Weiss","doi":"10.2458/V2I1.12351","DOIUrl":null,"url":null,"abstract":"The process of constructing a fixed-length conventional test frequently focuses on maximizing internal consistency reliability by selecting test items that are of average difficulty and high discrimination (a “peaked” test). The effect of constructing such a test, when viewed from the perspective of item response theory, is test scores that are precise for examinees whose trait levels are near the point at which the test is peaked; as examinee trait levels deviate from the mean, the precision of their scores decreases substantially. Results of a small simulation study demonstrate that when peaked tests are “off target” for an examinee, their scores are biased and have spuriously high standard deviations, reflecting substantial amounts of error. These errors can reduce the correlations of these kinds of scores with other variables and adversely affect the results of standard statistical tests. By contrast, scores from adaptive tests are essentially unbiased and have standard deviations that are much closer to true values. Basic concepts of adaptive testing are introduced and fully adaptive computerized tests (CATs) based on IRT are described. Several examples of response records from CATs are discussed to illustrate how CATs function. Some operational issues, including item exposure, content balancing, and enemy items are also briefly discussed. It is concluded that because CAT constructs a unique test for examinee, scores from CATs will be more precise and should provide better data for social science research and applications.","PeriodicalId":90602,"journal":{"name":"Journal of methods and measurement in the social sciences","volume":"78 1","pages":"1-27"},"PeriodicalIF":0.0000,"publicationDate":"2011-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.2458/V2I1.12351","citationCount":"53","resultStr":"{\"title\":\"Better Data From Better Measurements Using Computerized Adaptive Testing\",\"authors\":\"D. Weiss\",\"doi\":\"10.2458/V2I1.12351\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The process of constructing a fixed-length conventional test frequently focuses on maximizing internal consistency reliability by selecting test items that are of average difficulty and high discrimination (a “peaked” test). The effect of constructing such a test, when viewed from the perspective of item response theory, is test scores that are precise for examinees whose trait levels are near the point at which the test is peaked; as examinee trait levels deviate from the mean, the precision of their scores decreases substantially. Results of a small simulation study demonstrate that when peaked tests are “off target” for an examinee, their scores are biased and have spuriously high standard deviations, reflecting substantial amounts of error. These errors can reduce the correlations of these kinds of scores with other variables and adversely affect the results of standard statistical tests. By contrast, scores from adaptive tests are essentially unbiased and have standard deviations that are much closer to true values. Basic concepts of adaptive testing are introduced and fully adaptive computerized tests (CATs) based on IRT are described. Several examples of response records from CATs are discussed to illustrate how CATs function. Some operational issues, including item exposure, content balancing, and enemy items are also briefly discussed. It is concluded that because CAT constructs a unique test for examinee, scores from CATs will be more precise and should provide better data for social science research and applications.\",\"PeriodicalId\":90602,\"journal\":{\"name\":\"Journal of methods and measurement in the social sciences\",\"volume\":\"78 1\",\"pages\":\"1-27\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2011-10-03\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://sci-hub-pdf.com/10.2458/V2I1.12351\",\"citationCount\":\"53\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of methods and measurement in the social sciences\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.2458/V2I1.12351\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of methods and measurement in the social sciences","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2458/V2I1.12351","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 53

摘要

构建固定长度常规测试的过程通常侧重于通过选择平均难度和高判别性的测试项目(“峰值”测试)来最大化内部一致性信度。从项目反应理论的角度来看,构建这样一个测试的效果是,对于那些特质水平接近测试峰值的考生来说,测试分数是精确的;由于考生的特征水平偏离平均值,其分数的精度大大降低。一项小型模拟研究的结果表明,当峰值测试对考生来说“偏离目标”时,他们的分数是有偏见的,并且具有虚假的高标准偏差,反映了大量的错误。这些误差可以降低这些分数与其他变量的相关性,并对标准统计检验的结果产生不利影响。相比之下,适应性测试的分数基本上是无偏的,其标准偏差更接近真实值。介绍了自适应测试的基本概念,描述了基于IRT的全自适应计算机化测试(CATs)。本文讨论了cat的几个响应记录示例,以说明cat是如何工作的。一些操作问题,包括道具暴露,内容平衡和敌人道具也进行了简要讨论。由于CAT为考生构建了一个独特的测试,因此CAT的分数将更加精确,并且应该为社会科学研究和应用提供更好的数据。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Better Data From Better Measurements Using Computerized Adaptive Testing
The process of constructing a fixed-length conventional test frequently focuses on maximizing internal consistency reliability by selecting test items that are of average difficulty and high discrimination (a “peaked” test). The effect of constructing such a test, when viewed from the perspective of item response theory, is test scores that are precise for examinees whose trait levels are near the point at which the test is peaked; as examinee trait levels deviate from the mean, the precision of their scores decreases substantially. Results of a small simulation study demonstrate that when peaked tests are “off target” for an examinee, their scores are biased and have spuriously high standard deviations, reflecting substantial amounts of error. These errors can reduce the correlations of these kinds of scores with other variables and adversely affect the results of standard statistical tests. By contrast, scores from adaptive tests are essentially unbiased and have standard deviations that are much closer to true values. Basic concepts of adaptive testing are introduced and fully adaptive computerized tests (CATs) based on IRT are described. Several examples of response records from CATs are discussed to illustrate how CATs function. Some operational issues, including item exposure, content balancing, and enemy items are also briefly discussed. It is concluded that because CAT constructs a unique test for examinee, scores from CATs will be more precise and should provide better data for social science research and applications.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
审稿时长
26 weeks
期刊最新文献
Invitation for COVID-19 Submissions Machine Learning Method for High-Dimensional Education Data Comparing human coding to two natural language processing algorithms in aspirations of people affected by Duchenne Muscular Dystrophy The Modern Biased Information Test: Proposing alternatives for implicit measures Binary Classification: An Introductory Machine Learning Tutorial for Social Scientists
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1