Purpose: This study examined the impact of differential item functioning (DIF) on ability estimation in a computerized adaptive testing (CAT) environment using real response data from the 2017 Korean Medical Licensing Examination (KMLE). We hypothesized that excluding gender-based DIF items would improve estimation accuracy, particularly for examinees at the extremes of the ability scale.
Methods: The study was conducted in 2 steps: (1) DIF detection and (2) post-hoc simulation. The analysis used data from 3,259 examinees who completed all 360 dichotomous items. Gender-based DIF was detected with the residual-based DIF method (reference group: males; focal group: females). Two CAT conditions (all items vs. DIF-excluded) were compared against a "true θ" estimated from a fixed-form test of 264 non-DIF items. Accuracy was evaluated using bias, root mean square error (RMSE), and correlation with true θ.
Results: In the CAT condition excluding DIF items, accuracy improved, with RMSE reduced and correlation with true θ increased. However, bias was slightly larger in magnitude. Gender-specific analyses showed that DIF removal reduced the underestimation of female ability but increased the underestimation of male ability, yielding estimates that were fairer across genders. When DIF items were included, estimation errors were more pronounced at both low and high ability levels.
Conclusion: Managing DIF in CAT-based high-stakes examinations can enhance fairness and precision. Using real examinee data, this study provides practical evidence of the implications of DIF for CAT-based measurement and supports fairness-oriented test design.
扫码关注我们
求助内容:
应助结果提醒方式:
