Examining Differential Item Functioning from a Multidimensional IRT Perspective

IF 2.9 2区 心理学 Q1 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS Psychometrika Pub Date : 2024-04-05 DOI:10.1007/s11336-024-09965-6
Terry A. Ackerman, Ye Ma
{"title":"Examining Differential Item Functioning from a Multidimensional IRT Perspective","authors":"Terry A. Ackerman, Ye Ma","doi":"10.1007/s11336-024-09965-6","DOIUrl":null,"url":null,"abstract":"<p>Differential item functioning (DIF) is a standard analysis for every testing company. Research has demonstrated that DIF can result when test items measure different ability composites, and the groups being examined for DIF exhibit distinct underlying ability distributions on those composite abilities. In this article, we examine DIF from a two-dimensional multidimensional item response theory (MIRT) perspective. We begin by delving into the compensatory MIRT model, illustrating and how items and the composites they measure can be graphically represented. Additionally, we discuss how estimated item parameters can vary based on the underlying latent ability distributions of the examinees. Analytical research highlighting the consequences of ignoring dimensionally and applying unidimensional IRT models, where the two-dimensional latent space is mapped onto a unidimensional, is reviewed. Next, we investigate three different approaches to understanding DIF from a MIRT standpoint: 1. Analytically Uniform and Nonuniform DIF: When two groups of interest have different two-dimensional ability distributions, a unidimensional model is estimated. 2. Accounting for complete latent ability space: We emphasize the importance of considering the entire latent ability space when using DIF conditional approaches, which leads to the mitigation of DIF effects. 3. Scenario-Based DIF: Even when underlying two-dimensional distributions are identical for two groups, differing problem-solving approaches can still lead to DIF. Modern software programs facilitate routine DIF procedures for comparing response data from two identified groups of interest. The real challenge is to identify why DIF could occur with flagged items. Thus, as a closing challenge, we present four items (Appendix A) from a standardized test and invite readers to identify which group was favored by a DIF analysis.</p>","PeriodicalId":54534,"journal":{"name":"Psychometrika","volume":null,"pages":null},"PeriodicalIF":2.9000,"publicationDate":"2024-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Psychometrika","FirstCategoryId":"102","ListUrlMain":"https://doi.org/10.1007/s11336-024-09965-6","RegionNum":2,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MATHEMATICS, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
引用次数: 0

Abstract

Differential item functioning (DIF) is a standard analysis for every testing company. Research has demonstrated that DIF can result when test items measure different ability composites, and the groups being examined for DIF exhibit distinct underlying ability distributions on those composite abilities. In this article, we examine DIF from a two-dimensional multidimensional item response theory (MIRT) perspective. We begin by delving into the compensatory MIRT model, illustrating and how items and the composites they measure can be graphically represented. Additionally, we discuss how estimated item parameters can vary based on the underlying latent ability distributions of the examinees. Analytical research highlighting the consequences of ignoring dimensionally and applying unidimensional IRT models, where the two-dimensional latent space is mapped onto a unidimensional, is reviewed. Next, we investigate three different approaches to understanding DIF from a MIRT standpoint: 1. Analytically Uniform and Nonuniform DIF: When two groups of interest have different two-dimensional ability distributions, a unidimensional model is estimated. 2. Accounting for complete latent ability space: We emphasize the importance of considering the entire latent ability space when using DIF conditional approaches, which leads to the mitigation of DIF effects. 3. Scenario-Based DIF: Even when underlying two-dimensional distributions are identical for two groups, differing problem-solving approaches can still lead to DIF. Modern software programs facilitate routine DIF procedures for comparing response data from two identified groups of interest. The real challenge is to identify why DIF could occur with flagged items. Thus, as a closing challenge, we present four items (Appendix A) from a standardized test and invite readers to identify which group was favored by a DIF analysis.

Abstract Image

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
从多维 IRT 角度研究差异项目功能
差异项目功能(DIF)是每个测试公司的标准分析方法。研究表明,当测验项目测量的是不同的综合能力,而被测群体在这些综合能力上表现出不同的基本能力分布时,就会产生 DIF。本文将从二维多维项目反应理论(MIRT)的角度对 DIF 进行研究。首先,我们将深入探讨补偿性 MIRT 模型,说明项目及其测量的复合能力如何以图形表示。此外,我们还讨论了估计的项目参数如何根据考生的潜在能力分布而变化。分析研究强调了忽略维度和应用单维度 IRT 模型(将二维潜空间映射到单维度上)的后果。接下来,我们研究了从 MIRT 角度理解 DIF 的三种不同方法:1.分析均匀和非均匀 DIF:当两个相关群体具有不同的二维能力分布时,我们会估计一个单维模型。2.考虑完整的潜在能力空间:我们强调在使用 DIF 条件方法时考虑整个潜在能力空间的重要性,这样可以减轻 DIF 的影响。3.基于情景的 DIF:即使两组的基本二维分布相同,不同的解题方法仍可能导致 DIF。现代软件程序为常规 DIF 程序提供了便利,可用于比较两个已确定的相关群体的响应数据。真正的挑战在于找出标记项目可能出现 DIF 的原因。因此,作为最后的挑战,我们提出了一个标准化测试中的四个项目(附录 A),并邀请读者通过 DIF 分析来确定哪个组别更受青睐。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Psychometrika
Psychometrika 数学-数学跨学科应用
CiteScore
4.40
自引率
10.00%
发文量
72
审稿时长
>12 weeks
期刊介绍: The journal Psychometrika is devoted to the advancement of theory and methodology for behavioral data in psychology, education and the social and behavioral sciences generally. Its coverage is offered in two sections: Theory and Methods (T& M), and Application Reviews and Case Studies (ARCS). T&M articles present original research and reviews on the development of quantitative models, statistical methods, and mathematical techniques for evaluating data from psychology, the social and behavioral sciences and related fields. Application Reviews can be integrative, drawing together disparate methodologies for applications, or comparative and evaluative, discussing advantages and disadvantages of one or more methodologies in applications. Case Studies highlight methodology that deepens understanding of substantive phenomena through more informative data analysis, or more elegant data description.
期刊最新文献
Rejoinder to McNeish and Mislevy: What Does Psychological Measurement Require? Are Sum Scores a Great Accomplishment of Psychometrics or Intuitive Test Theory? Correction to: Generalized Structured Component Analysis Accommodating Convex Components: A Knowledge-Based Multivariate Method with Interpretable Composite Indexes. Variational Estimation for Multidimensional Generalized Partial Credit Model. Proof of Reliability Convergence to 1 at Rate of Spearman-Brown Formula for Random Test Forms and Irrespective of Item Pool Dimensionality.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1