{"title":"Examining test fairness across gender in a computerised reading test: A comparison between the Rasch-based DIF technique and MIMIC","authors":"Xuelian Zhu, Vahid Aryadoust","doi":"10.58379/nvft3338","DOIUrl":null,"url":null,"abstract":"Test fairness has been recognised as a fundamental requirement of test validation. Two quantitative approaches to investigate test fairness, the Rasch-based differential item functioning (DIF) detection method and a measurement invariance technique called multiple indicators, multiple causes (MIMIC), were adopted and compared in a test fairness study of the Pearson Test of English (PTE) Academic Reading test (n = 783). The Rasch partial credit model (PCM) showed no statistically significant uniform DIF across gender and, similarly, the MIMIC analysis showed that measurement invariance was maintained in the test. However, six pairs of significant non-uniform DIF (p < 0.05) were found in the DIF analysis. A discussion of the results and post-hoc content analysis is presented and the theoretical and practical implications of the study for test developers and language assessment are discussed.","PeriodicalId":29650,"journal":{"name":"Studies in Language Assessment","volume":"145 1","pages":""},"PeriodicalIF":0.1000,"publicationDate":"2019-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Studies in Language Assessment","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.58379/nvft3338","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"LINGUISTICS","Score":null,"Total":0}
引用次数: 6
Abstract
Test fairness has been recognised as a fundamental requirement of test validation. Two quantitative approaches to investigate test fairness, the Rasch-based differential item functioning (DIF) detection method and a measurement invariance technique called multiple indicators, multiple causes (MIMIC), were adopted and compared in a test fairness study of the Pearson Test of English (PTE) Academic Reading test (n = 783). The Rasch partial credit model (PCM) showed no statistically significant uniform DIF across gender and, similarly, the MIMIC analysis showed that measurement invariance was maintained in the test. However, six pairs of significant non-uniform DIF (p < 0.05) were found in the DIF analysis. A discussion of the results and post-hoc content analysis is presented and the theoretical and practical implications of the study for test developers and language assessment are discussed.
测试公平性已被认为是测试验证的基本要求。采用两种定量方法来调查考试公平性,即基于rasch的差异项目功能(DIF)检测方法和称为多指标,多原因(MIMIC)的测量不变性技术,并在Pearson test of English (PTE)学术阅读测试(n = 783)的考试公平性研究中进行了比较。Rasch部分信用模型(PCM)显示,在性别之间没有统计学意义上的统一DIF,类似地,MIMIC分析显示,在测试中保持测量不变性。然而,在DIF分析中发现了6对显著的非均匀DIF (p < 0.05)。对结果和事后内容分析进行了讨论,并讨论了该研究对测试开发者和语言评估的理论和实践意义。