Examining Differential Item Functioning from a Multidimensional IRT Perspective

IF 3.1 2区心理学 Q1 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS Psychometrika Pub Date : 2024-04-05 DOI:10.1007/s11336-024-09965-6

Terry A. Ackerman, Ye Ma

{"title":"Examining Differential Item Functioning from a Multidimensional IRT Perspective","authors":"Terry A. Ackerman, Ye Ma","doi":"10.1007/s11336-024-09965-6","DOIUrl":null,"url":null,"abstract":"<p>Differential item functioning (DIF) is a standard analysis for every testing company. Research has demonstrated that DIF can result when test items measure different ability composites, and the groups being examined for DIF exhibit distinct underlying ability distributions on those composite abilities. In this article, we examine DIF from a two-dimensional multidimensional item response theory (MIRT) perspective. We begin by delving into the compensatory MIRT model, illustrating and how items and the composites they measure can be graphically represented. Additionally, we discuss how estimated item parameters can vary based on the underlying latent ability distributions of the examinees. Analytical research highlighting the consequences of ignoring dimensionally and applying unidimensional IRT models, where the two-dimensional latent space is mapped onto a unidimensional, is reviewed. Next, we investigate three different approaches to understanding DIF from a MIRT standpoint: 1. Analytically Uniform and Nonuniform DIF: When two groups of interest have different two-dimensional ability distributions, a unidimensional model is estimated. 2. Accounting for complete latent ability space: We emphasize the importance of considering the entire latent ability space when using DIF conditional approaches, which leads to the mitigation of DIF effects. 3. Scenario-Based DIF: Even when underlying two-dimensional distributions are identical for two groups, differing problem-solving approaches can still lead to DIF. Modern software programs facilitate routine DIF procedures for comparing response data from two identified groups of interest. The real challenge is to identify why DIF could occur with flagged items. Thus, as a closing challenge, we present four items (Appendix A) from a standardized test and invite readers to identify which group was favored by a DIF analysis.</p>","PeriodicalId":54534,"journal":{"name":"Psychometrika","volume":"50 1","pages":""},"PeriodicalIF":3.1000,"publicationDate":"2024-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Psychometrika","FirstCategoryId":"102","ListUrlMain":"https://doi.org/10.1007/s11336-024-09965-6","RegionNum":2,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MATHEMATICS, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}

引用次数: 0

Abstract

Differential item functioning (DIF) is a standard analysis for every testing company. Research has demonstrated that DIF can result when test items measure different ability composites, and the groups being examined for DIF exhibit distinct underlying ability distributions on those composite abilities. In this article, we examine DIF from a two-dimensional multidimensional item response theory (MIRT) perspective. We begin by delving into the compensatory MIRT model, illustrating and how items and the composites they measure can be graphically represented. Additionally, we discuss how estimated item parameters can vary based on the underlying latent ability distributions of the examinees. Analytical research highlighting the consequences of ignoring dimensionally and applying unidimensional IRT models, where the two-dimensional latent space is mapped onto a unidimensional, is reviewed. Next, we investigate three different approaches to understanding DIF from a MIRT standpoint: 1. Analytically Uniform and Nonuniform DIF: When two groups of interest have different two-dimensional ability distributions, a unidimensional model is estimated. 2. Accounting for complete latent ability space: We emphasize the importance of considering the entire latent ability space when using DIF conditional approaches, which leads to the mitigation of DIF effects. 3. Scenario-Based DIF: Even when underlying two-dimensional distributions are identical for two groups, differing problem-solving approaches can still lead to DIF. Modern software programs facilitate routine DIF procedures for comparing response data from two identified groups of interest. The real challenge is to identify why DIF could occur with flagged items. Thus, as a closing challenge, we present four items (Appendix A) from a standardized test and invite readers to identify which group was favored by a DIF analysis.

Abstract Image

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

从多维 IRT 角度研究差异项目功能

差异项目功能（DIF）是每个测试公司的标准分析方法。研究表明，当测验项目测量的是不同的综合能力，而被测群体在这些综合能力上表现出不同的基本能力分布时，就会产生 DIF。本文将从二维多维项目反应理论（MIRT）的角度对 DIF 进行研究。首先，我们将深入探讨补偿性 MIRT 模型，说明项目及其测量的复合能力如何以图形表示。此外，我们还讨论了估计的项目参数如何根据考生的潜在能力分布而变化。分析研究强调了忽略维度和应用单维度 IRT 模型（将二维潜空间映射到单维度上）的后果。接下来，我们研究了从 MIRT 角度理解 DIF 的三种不同方法：1.分析均匀和非均匀 DIF：当两个相关群体具有不同的二维能力分布时，我们会估计一个单维模型。2.考虑完整的潜在能力空间：我们强调在使用 DIF 条件方法时考虑整个潜在能力空间的重要性，这样可以减轻 DIF 的影响。3.基于情景的 DIF：即使两组的基本二维分布相同，不同的解题方法仍可能导致 DIF。现代软件程序为常规 DIF 程序提供了便利，可用于比较两个已确定的相关群体的响应数据。真正的挑战在于找出标记项目可能出现 DIF 的原因。因此，作为最后的挑战，我们提出了一个标准化测试中的四个项目（附录 A），并邀请读者通过 DIF 分析来确定哪个组别更受青睐。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Psychometrika 数学-数学跨学科应用

CiteScore

4.40

自引率

10.00%

发文量

审稿时长

>12 weeks

期刊介绍： The journal Psychometrika is devoted to the advancement of theory and methodology for behavioral data in psychology, education and the social and behavioral sciences generally. Its coverage is offered in two sections: Theory and Methods (T& M), and Application Reviews and Case Studies (ARCS). T&M articles present original research and reviews on the development of quantitative models, statistical methods, and mathematical techniques for evaluating data from psychology, the social and behavioral sciences and related fields. Application Reviews can be integrative, drawing together disparate methodologies for applications, or comparative and evaluative, discussing advantages and disadvantages of one or more methodologies in applications. Case Studies highlight methodology that deepens understanding of substantive phenomena through more informative data analysis, or more elegant data description.