Hongwen Guo, Matthew S. Johnson, Daniel F. McCaffrey, Lixong Gu
{"title":"多阶段测试设计下小样本项目校准的实际考虑因素:案例研究","authors":"Hongwen Guo, Matthew S. Johnson, Daniel F. McCaffrey, Lixong Gu","doi":"10.1002/ets2.12376","DOIUrl":null,"url":null,"abstract":"The multistage testing (MST) design has been gaining attention and popularity in educational assessments. For testing programs that have small test‐taker samples, it is challenging to calibrate new items to replenish the item pool. In the current research, we used the item pools from an operational MST program to illustrate how research studies can be built upon literature and program‐specific data to help to fill the gaps between research and practice and to make sound psychometric decisions to address the small‐sample issues. The studies included choice of item calibration methods, data collection designs to increase sample sizes, and item response theory models in producing the score conversion tables. Our results showed that, with small samples, the fixed parameter calibration (FIPC) method performed consistently the best for calibrating new items, compared to the traditional separate‐calibration with scaling method and a new approach of a calibration method based on the minimum discriminant information adjustment. In addition, the concurrent FIPC calibration with data from multiple administrations also improved parameter estimation of new items. However, because of the program‐specific settings, a simpler model may not improve current practice when the sample size was small and when the initial item pools were well‐calibrated using a two‐parameter logistic model with a large field trial data.","PeriodicalId":11972,"journal":{"name":"ETS Research Report Series","volume":"2011 22","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-02-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Practical Considerations in Item Calibration With Small Samples Under Multistage Test Design: A Case Study\",\"authors\":\"Hongwen Guo, Matthew S. Johnson, Daniel F. McCaffrey, Lixong Gu\",\"doi\":\"10.1002/ets2.12376\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The multistage testing (MST) design has been gaining attention and popularity in educational assessments. For testing programs that have small test‐taker samples, it is challenging to calibrate new items to replenish the item pool. In the current research, we used the item pools from an operational MST program to illustrate how research studies can be built upon literature and program‐specific data to help to fill the gaps between research and practice and to make sound psychometric decisions to address the small‐sample issues. The studies included choice of item calibration methods, data collection designs to increase sample sizes, and item response theory models in producing the score conversion tables. Our results showed that, with small samples, the fixed parameter calibration (FIPC) method performed consistently the best for calibrating new items, compared to the traditional separate‐calibration with scaling method and a new approach of a calibration method based on the minimum discriminant information adjustment. In addition, the concurrent FIPC calibration with data from multiple administrations also improved parameter estimation of new items. However, because of the program‐specific settings, a simpler model may not improve current practice when the sample size was small and when the initial item pools were well‐calibrated using a two‐parameter logistic model with a large field trial data.\",\"PeriodicalId\":11972,\"journal\":{\"name\":\"ETS Research Report Series\",\"volume\":\"2011 22\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-02-04\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ETS Research Report Series\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1002/ets2.12376\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"Social Sciences\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ETS Research Report Series","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1002/ets2.12376","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"Social Sciences","Score":null,"Total":0}
Practical Considerations in Item Calibration With Small Samples Under Multistage Test Design: A Case Study
The multistage testing (MST) design has been gaining attention and popularity in educational assessments. For testing programs that have small test‐taker samples, it is challenging to calibrate new items to replenish the item pool. In the current research, we used the item pools from an operational MST program to illustrate how research studies can be built upon literature and program‐specific data to help to fill the gaps between research and practice and to make sound psychometric decisions to address the small‐sample issues. The studies included choice of item calibration methods, data collection designs to increase sample sizes, and item response theory models in producing the score conversion tables. Our results showed that, with small samples, the fixed parameter calibration (FIPC) method performed consistently the best for calibrating new items, compared to the traditional separate‐calibration with scaling method and a new approach of a calibration method based on the minimum discriminant information adjustment. In addition, the concurrent FIPC calibration with data from multiple administrations also improved parameter estimation of new items. However, because of the program‐specific settings, a simpler model may not improve current practice when the sample size was small and when the initial item pools were well‐calibrated using a two‐parameter logistic model with a large field trial data.