{"title":"利用 BIB 项目抽样评估道格拉斯-科恩 IRT 拟合度量法","authors":"John R. Donoghue, Adrienne N. Sgammato","doi":"10.1177/01466216241238740","DOIUrl":null,"url":null,"abstract":"Methods to detect item response theory (IRT) item-level misfit are typically derived assuming fixed test forms. However, IRT is also employed with more complicated test designs, such as the balanced incomplete block (BIB) design used in large-scale educational assessments. This study investigates two modifications of Douglas and Cohen’s 2001 nonparametric method of assessing item misfit, based on A) using block total score and B) pooling booklet level scores for analyzing BIB data. Block-level scores showed extreme inflation of Type I error for short blocks containing 5 or 10 items. The pooled booklet method yielded Type I error rates close to nominal [Formula: see text] in most conditions and had power to detect misfitting items. The study also found that the Douglas and Cohen procedure is only slightly affected by the presence of other misfitting items in the block. The pooled booklet method is recommended for practical applications of Douglas and Cohen’s method with BIB data.","PeriodicalId":48300,"journal":{"name":"Applied Psychological Measurement","volume":null,"pages":null},"PeriodicalIF":1.0000,"publicationDate":"2024-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Evaluating the Douglas-Cohen IRT Goodness of Fit Measure With BIB Sampling of Items\",\"authors\":\"John R. Donoghue, Adrienne N. Sgammato\",\"doi\":\"10.1177/01466216241238740\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Methods to detect item response theory (IRT) item-level misfit are typically derived assuming fixed test forms. However, IRT is also employed with more complicated test designs, such as the balanced incomplete block (BIB) design used in large-scale educational assessments. This study investigates two modifications of Douglas and Cohen’s 2001 nonparametric method of assessing item misfit, based on A) using block total score and B) pooling booklet level scores for analyzing BIB data. Block-level scores showed extreme inflation of Type I error for short blocks containing 5 or 10 items. The pooled booklet method yielded Type I error rates close to nominal [Formula: see text] in most conditions and had power to detect misfitting items. The study also found that the Douglas and Cohen procedure is only slightly affected by the presence of other misfitting items in the block. The pooled booklet method is recommended for practical applications of Douglas and Cohen’s method with BIB data.\",\"PeriodicalId\":48300,\"journal\":{\"name\":\"Applied Psychological Measurement\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":1.0000,\"publicationDate\":\"2024-03-14\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Applied Psychological Measurement\",\"FirstCategoryId\":\"102\",\"ListUrlMain\":\"https://doi.org/10.1177/01466216241238740\",\"RegionNum\":4,\"RegionCategory\":\"心理学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"PSYCHOLOGY, MATHEMATICAL\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Applied Psychological Measurement","FirstCategoryId":"102","ListUrlMain":"https://doi.org/10.1177/01466216241238740","RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"PSYCHOLOGY, MATHEMATICAL","Score":null,"Total":0}
引用次数: 0
摘要
检测项目反应理论(IRT)项目级误差的方法通常是在假定测试形式固定的情况下得出的。然而,IRT 也适用于更复杂的测验设计,如大规模教育评估中使用的平衡不完全区组(BIB)设计。本研究调查了 Douglas 和 Cohen 2001 年评估项目不匹配度的非参数方法的两种修改方案,分别基于 A) 使用组块总分和 B) 汇总册级分数来分析 BIB 数据。对于包含 5 或 10 个项目的短块,块级得分显示出 I 类误差的极度膨胀。在大多数情况下,汇总的小册子方法产生的 I 类误差率接近名义误差率[公式:见正文],并且有能力检测出不匹配的项目。研究还发现,Douglas 和 Cohen 程序只会受到区块中存在其他不匹配项目的轻微影响。建议在实际应用道格拉斯和科恩的方法处理 BIB 数据时,采用集合小册子法。
Evaluating the Douglas-Cohen IRT Goodness of Fit Measure With BIB Sampling of Items
Methods to detect item response theory (IRT) item-level misfit are typically derived assuming fixed test forms. However, IRT is also employed with more complicated test designs, such as the balanced incomplete block (BIB) design used in large-scale educational assessments. This study investigates two modifications of Douglas and Cohen’s 2001 nonparametric method of assessing item misfit, based on A) using block total score and B) pooling booklet level scores for analyzing BIB data. Block-level scores showed extreme inflation of Type I error for short blocks containing 5 or 10 items. The pooled booklet method yielded Type I error rates close to nominal [Formula: see text] in most conditions and had power to detect misfitting items. The study also found that the Douglas and Cohen procedure is only slightly affected by the presence of other misfitting items in the block. The pooled booklet method is recommended for practical applications of Douglas and Cohen’s method with BIB data.
期刊介绍:
Applied Psychological Measurement publishes empirical research on the application of techniques of psychological measurement to substantive problems in all areas of psychology and related disciplines.