In this study, we introduced a cross-classified multidimensional nominal response model (CC-MNRM) to account for various response styles (RS) in the presence of cross-classified data. The proposed model allows slopes to vary across items and can explore impacts of observed covariates on latent constructs. We applied a recently developed variant of the Metropolis-Hastings Robbins-Monro (MH-RM) algorithm to address the computational challenge of estimating the proposed model. To demonstrate our new approach, we analyzed empirical student evaluation of teaching (SET) data collected from a large public university with three models: a CC-MNRM with RS, a CC-MNRM with no RS, and a multilevel MNRM with RS. Results indicated that the three models led to different inferences regarding the observed covariates. Additionally, in the example, ignoring/incorporating RS led to changes in student substantive scores, while the instructor substantive scores were less impacted. Misspecifying the cross-classified data structure resulted in apparent changes on instructor scores. To further evaluate the proposed modeling approach, we conducted a preliminary simulation study and observed good parameter and score recovery. We concluded this study with discussions of limitations and future research directions.
{"title":"Modeling Response Styles in Cross-Classified Data Using a Cross-Classified Multidimensional Nominal Response Model","authors":"Sijia Huang, Seungwon Chung, Carl F. Falk","doi":"10.1111/jedm.12401","DOIUrl":"10.1111/jedm.12401","url":null,"abstract":"<p>In this study, we introduced a cross-classified multidimensional nominal response model (CC-MNRM) to account for various response styles (RS) in the presence of cross-classified data. The proposed model allows slopes to vary across items and can explore impacts of observed covariates on latent constructs. We applied a recently developed variant of the Metropolis-Hastings Robbins-Monro (MH-RM) algorithm to address the computational challenge of estimating the proposed model. To demonstrate our new approach, we analyzed empirical student evaluation of teaching (SET) data collected from a large public university with three models: a CC-MNRM with RS, a CC-MNRM with no RS, and a multilevel MNRM with RS. Results indicated that the three models led to different inferences regarding the observed covariates. Additionally, in the example, ignoring/incorporating RS led to changes in student substantive scores, while the instructor substantive scores were less impacted. Misspecifying the cross-classified data structure resulted in apparent changes on instructor scores. To further evaluate the proposed modeling approach, we conducted a preliminary simulation study and observed good parameter and score recovery. We concluded this study with discussions of limitations and future research directions.</p>","PeriodicalId":47871,"journal":{"name":"Journal of Educational Measurement","volume":"61 3","pages":"486-510"},"PeriodicalIF":1.4,"publicationDate":"2024-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141187894","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The Lognormal Response Time (LNRT) model measures the speed of test-takers relative to the normative time demands of items on a test. The resulting speed parameters and model residuals are often analyzed for evidence of anomalous test-taking behavior associated with fast and poorly fitting response time patterns. Extending this model, we demonstrate the connection between the existing LNRT model parameters and the “level” component of profile similarity, and we define two new parameters for the LNRT model representing profile “dispersion” and “shape.” We show that while the LNRT model measures level (speed), profile dispersion and shape are conflated in model residuals, and that distinguishing them provides meaningful and useful parameters for identifying anomalous testing behavior. Results from data in a situation where many test-takers gained preknowledge of test items revealed that profile shape, not currently measured in the LNRT model, was the most sensitive response time index to the abnormal test-taking behavior patterns. Results strongly support expanding the LNRT model to measure not only each test-taker's level of speed, but also the dispersion and shape of their response time profiles.
{"title":"Expanding the Lognormal Response Time Model Using Profile Similarity Metrics to Improve the Detection of Anomalous Testing Behavior","authors":"Gregory M. Hurtz, Regi Mucino","doi":"10.1111/jedm.12395","DOIUrl":"10.1111/jedm.12395","url":null,"abstract":"<p>The Lognormal Response Time (LNRT) model measures the speed of test-takers relative to the normative time demands of items on a test. The resulting speed parameters and model residuals are often analyzed for evidence of anomalous test-taking behavior associated with fast and poorly fitting response time patterns. Extending this model, we demonstrate the connection between the existing LNRT model parameters and the “level” component of profile similarity, and we define two new parameters for the LNRT model representing profile “dispersion” and “shape.” We show that while the LNRT model measures level (speed), profile dispersion and shape are conflated in model residuals, and that distinguishing them provides meaningful and useful parameters for identifying anomalous testing behavior. Results from data in a situation where many test-takers gained preknowledge of test items revealed that profile shape, not currently measured in the LNRT model, was the most sensitive response time index to the abnormal test-taking behavior patterns. Results strongly support expanding the LNRT model to measure not only each test-taker's level of speed, but also the dispersion and shape of their response time profiles.</p>","PeriodicalId":47871,"journal":{"name":"Journal of Educational Measurement","volume":"61 3","pages":"458-485"},"PeriodicalIF":1.4,"publicationDate":"2024-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140939780","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Corinne Huggins-Manley, Anthony W. Raborn, Peggy K. Jones, Ted Myers
The purpose of this study is to develop a nonparametric DIF method that (a) compares focal groups directly to the composite group that will be used to develop the reported test score scale, and (b) allows practitioners to explore for DIF related to focal groups stemming from multicategorical variables that constitute a small proportion of the overall testing population. We propose the nonparametric root expected proportion squared difference (REPSD) index that evaluates the statistical significance of composite group DIF for relatively small focal groups stemming from multicategorical focal variables, with decisions of statistical significance based on quasi-exact p values obtained from Monte Carlo permutations of the DIF statistic under the null distribution. We conduct a simulation to evaluate conditions under which the index produces acceptable Type I error and power rates, as well as an application to a school district assessment. Practitioners can calculate the REPSD index in a freely available package we created in the R environment.
本研究的目的是开发一种非参数 DIF 方法,该方法(a)可将焦点组直接与将用于开发报告测试得分量表的综合组进行比较,(b)允许从业人员探索与源自多类别变量的焦点组相关的 DIF,这些焦点组在整个测试人群中只占很小的比例。我们提出了非参数根期望比例平方差(REPSD)指数,该指数可评估源自多类别焦点变量的相对较小焦点组的复合组 DIF 的统计显著性,统计显著性的判定依据的是在零分布下对 DIF 统计量进行蒙特卡罗排列所获得的准精确 p 值。我们进行了一次模拟,以评估该指数在哪些条件下可产生可接受的 I 类错误和幂率,并将其应用于学区评估。实践者可以通过我们在 R 环境中创建的免费软件包计算 REPSD 指数。
{"title":"A Nonparametric Composite Group DIF Index for Focal Groups Stemming from Multicategorical Variables","authors":"Corinne Huggins-Manley, Anthony W. Raborn, Peggy K. Jones, Ted Myers","doi":"10.1111/jedm.12394","DOIUrl":"10.1111/jedm.12394","url":null,"abstract":"<p>The purpose of this study is to develop a nonparametric DIF method that (a) compares focal groups directly to the composite group that will be used to develop the reported test score scale, and (b) allows practitioners to explore for DIF related to focal groups stemming from multicategorical variables that constitute a small proportion of the overall testing population. We propose the nonparametric root expected proportion squared difference (<i>REPSD</i>) index that evaluates the statistical significance of composite group DIF for relatively small focal groups stemming from multicategorical focal variables, with decisions of statistical significance based on quasi-exact <i>p</i> values obtained from Monte Carlo permutations of the DIF statistic under the null distribution. We conduct a simulation to evaluate conditions under which the index produces acceptable Type I error and power rates, as well as an application to a school district assessment. Practitioners can calculate the <i>REPSD</i> index in a freely available package we created in the R environment.</p>","PeriodicalId":47871,"journal":{"name":"Journal of Educational Measurement","volume":"61 3","pages":"432-457"},"PeriodicalIF":1.4,"publicationDate":"2024-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140925406","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Frank Goldhammer, Ulf Kroehne, Carolin Hahnel, Johannes Naumann, Paul De Boeck
The efficiency of cognitive component skills is typically assessed with speeded performance tests. Interpreting only effective ability or effective speed as efficiency may be challenging because of the within-person dependency between both variables (speed-ability tradeoff, SAT). The present study measures efficiency as effective ability conditional on speed by controlling speed experimentally. Item-level time limits control the stimulus presentation time and the time window for responding (timed condition). The overall goal was to examine the construct validity of effective ability scores obtained from untimed and timed condition by comparing the effects of theory-based item properties on item difficulty. If such effects exist, the scores reflect how well the test-takers were able to cope with the theory-based requirements. A German subsample from PISA 2012 completed two reading component skills tasks (i.e., word recognition and semantic integration) with and without item-level time limits. Overall, the included linguistic item properties showed stronger effects on item difficulty in the timed than the untimed condition. In the semantic integration task, item properties explained the time required in the untimed condition. The results suggest that effective ability scores in the timed condition better reflect how well test-takers were able to cope with the theoretically relevant task demands.
{"title":"Does Timed Testing Affect the Interpretation of Efficiency Scores?—A GLMM Analysis of Reading Components","authors":"Frank Goldhammer, Ulf Kroehne, Carolin Hahnel, Johannes Naumann, Paul De Boeck","doi":"10.1111/jedm.12393","DOIUrl":"10.1111/jedm.12393","url":null,"abstract":"<p>The efficiency of cognitive component skills is typically assessed with speeded performance tests. Interpreting only effective ability or effective speed as efficiency may be challenging because of the within-person dependency between both variables (speed-ability tradeoff, SAT). The present study measures efficiency as effective ability conditional on speed by controlling speed experimentally. Item-level time limits control the stimulus presentation time and the time window for responding (timed condition). The overall goal was to examine the construct validity of effective ability scores obtained from untimed and timed condition by comparing the effects of theory-based item properties on item difficulty. If such effects exist, the scores reflect how well the test-takers were able to cope with the theory-based requirements. A German subsample from PISA 2012 completed two reading component skills tasks (i.e., word recognition and semantic integration) with and without item-level time limits. Overall, the included linguistic item properties showed stronger effects on item difficulty in the timed than the untimed condition. In the semantic integration task, item properties explained the time required in the untimed condition. The results suggest that effective ability scores in the timed condition better reflect how well test-takers were able to cope with the theoretically relevant task demands.</p>","PeriodicalId":47871,"journal":{"name":"Journal of Educational Measurement","volume":"61 3","pages":"349-377"},"PeriodicalIF":1.4,"publicationDate":"2024-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/jedm.12393","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140940082","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"von Davier, Alina , Mislevy, Robert J. , and Hao, Jiangang (Eds.) (2021). Computational Psychometrics: New Methodologies for a New Generation of Digital Learning and Assessment. Methodology of Educational Measurement and Assessment. Springer, Cham. https://doi.org/10.1007/978-3-030-74394-9_1","authors":"Hong Jiao","doi":"10.1111/jedm.12392","DOIUrl":"10.1111/jedm.12392","url":null,"abstract":"","PeriodicalId":47871,"journal":{"name":"Journal of Educational Measurement","volume":"61 3","pages":"560-566"},"PeriodicalIF":1.4,"publicationDate":"2024-04-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140661378","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Matthew J. Madison, Stefanie A. Wind, Lientje Maas, Kazuhiro Yamaguchi, Sergio Haab
Diagnostic classification models (DCMs) are psychometric models designed to classify examinees according to their proficiency or nonproficiency of specified latent characteristics. These models are well suited for providing diagnostic and actionable feedback to support intermediate and formative assessment efforts. Several DCMs have been developed and applied in different settings. This study examines a DCM with functional form similar to the 1-parameter logistic item response theory model. Using data from a large-scale mathematics education research study, we demonstrate and prove that the proposed DCM has measurement properties akin to the Rasch and one-parameter logistic item response theory models, including sum score sufficiency, item-free and person-free measurement, and invariant item and person ordering. We introduce some potential applications for this model, and discuss the implications and limitations of these developments, as well as directions for future research.
{"title":"A One-Parameter Diagnostic Classification Model with Familiar Measurement Properties","authors":"Matthew J. Madison, Stefanie A. Wind, Lientje Maas, Kazuhiro Yamaguchi, Sergio Haab","doi":"10.1111/jedm.12390","DOIUrl":"10.1111/jedm.12390","url":null,"abstract":"<p>Diagnostic classification models (DCMs) are psychometric models designed to classify examinees according to their proficiency or nonproficiency of specified latent characteristics. These models are well suited for providing diagnostic and actionable feedback to support intermediate and formative assessment efforts. Several DCMs have been developed and applied in different settings. This study examines a DCM with functional form similar to the 1-parameter logistic item response theory model. Using data from a large-scale mathematics education research study, we demonstrate and prove that the proposed DCM has measurement properties akin to the Rasch and one-parameter logistic item response theory models, including sum score sufficiency, item-free and person-free measurement, and invariant item and person ordering. We introduce some potential applications for this model, and discuss the implications and limitations of these developments, as well as directions for future research.</p>","PeriodicalId":47871,"journal":{"name":"Journal of Educational Measurement","volume":"61 3","pages":"408-431"},"PeriodicalIF":1.4,"publicationDate":"2024-04-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140798136","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Understanding the intraindividual relation between an individual's speed and ability in testing scenarios is essential to assure a fair assessment. Different approaches exist for estimating this relationship, that either rely on specific study designs or on specific assumptions. This paper aims to add to the toolbox of approaches for estimating this relationship. We propose the intraindividual speed-ability-relation (ISAR) model, which relies on nonstationarity of speed and ability over the course of the test. The ISAR model explicitly models intraindividual change in ability and speed within a test and assesses the intraindividual relation of speed and ability by evaluating the relationship of both latent change variables. Model estimation is good, when there are interindividual differences in speed and ability changes in the data. In empirical data from PISA, we found that the intraindividual relationship between speed and ability is not universally negative for all individuals and varies across different competence domains and countries. We discuss possible explanations for this relationship.
{"title":"Modeling the Intraindividual Relation of Ability and Speed within a Test","authors":"Augustin Mutak, Robert Krause, Esther Ulitzsch, Sören Much, Jochen Ranger, Steffi Pohl","doi":"10.1111/jedm.12391","DOIUrl":"10.1111/jedm.12391","url":null,"abstract":"<p>Understanding the intraindividual relation between an individual's speed and ability in testing scenarios is essential to assure a fair assessment. Different approaches exist for estimating this relationship, that either rely on specific study designs or on specific assumptions. This paper aims to add to the toolbox of approaches for estimating this relationship. We propose the intraindividual speed-ability-relation (ISAR) model, which relies on nonstationarity of speed and ability over the course of the test. The ISAR model explicitly models intraindividual change in ability and speed within a test and assesses the intraindividual relation of speed and ability by evaluating the relationship of both latent change variables. Model estimation is good, when there are interindividual differences in speed and ability changes in the data. In empirical data from PISA, we found that the intraindividual relationship between speed and ability is not universally negative for all individuals and varies across different competence domains and countries. We discuss possible explanations for this relationship.</p>","PeriodicalId":47871,"journal":{"name":"Journal of Educational Measurement","volume":"61 3","pages":"378-407"},"PeriodicalIF":1.4,"publicationDate":"2024-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/jedm.12391","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140630432","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sun-Joo Cho, Amanda Goodwin, Matthew Naveiras, Jorge Salas
Despite the growing interest in incorporating response time data into item response models, there has been a lack of research investigating how the effect of speed on the probability of a correct response varies across different groups (e.g., experimental conditions) for various items (i.e., differential response time item analysis). Furthermore, previous research has shown a complex relationship between response time and accuracy, necessitating a functional analysis to understand the patterns that manifest from this relationship. In this study, response time data are incorporated into an item response model for two purposes: (a) to examine how individuals' speed within an experimental condition affects their response accuracy on an item, and (b) to detect the differences in individuals' speed between conditions in the presence of within-condition effects. For these two purposes, by-variable smooth functions are employed to model differential and functional response time effects by experimental condition for each item. This model is illustrated using an empirical data set to describe the effect of individuals' speed on their reading comprehension ability in two experimental conditions of reading medium (paper vs. digital) by item. A simulation study showed that the recovery of parameters and by-variable smooth functions of response time was satisfactory, and that the type I error rate and power of the test for the by-variable smooth function of response time were acceptable in conditions similar to the empirical data set. In addition, the proposed method correctly identified the range of response time where between-condition differences in the effect of response time on the probability of a correct response were accurate.
尽管将反应时间数据纳入项目反应模型的兴趣日益浓厚,但一直缺乏研究调查不同组别(如实验条件)对不同项目的速度对正确反应概率的影响是如何变化的(即差异反应时间项目分析)。此外,以往的研究表明,反应时间与正确率之间存在复杂的关系,因此有必要进行功能分析,以了解这种关系所体现的模式。在本研究中,将反应时间数据纳入项目反应模型有两个目的:(a) 检验个体在实验条件下的速度如何影响其对项目的反应准确性;(b) 在存在条件内效应的情况下,检测个体在不同条件下的速度差异。为了实现这两个目的,我们采用了副变量平滑函数来模拟每个项目在实验条件下的差异和功能反应时间效应。该模型通过一组经验数据来说明在两种阅读媒介(纸质阅读媒介和数字阅读媒介)的实验条件下,个人速度对其阅读理解能力的影响。模拟研究表明,参数和响应时间的副变平滑函数的恢复令人满意,在与经验数据集类似的条件下,响应时间的副变平滑函数的 I 类错误率和检验功率是可以接受的。此外,所提出的方法还正确确定了响应时间的范围,在该范围内,响应时间对正确响应概率的影响的条件间差异是准确的。
{"title":"Differential and Functional Response Time Item Analysis: An Application to Understanding Paper versus Digital Reading Processes","authors":"Sun-Joo Cho, Amanda Goodwin, Matthew Naveiras, Jorge Salas","doi":"10.1111/jedm.12389","DOIUrl":"10.1111/jedm.12389","url":null,"abstract":"<p>Despite the growing interest in incorporating response time data into item response models, there has been a lack of research investigating how the effect of speed on the probability of a correct response varies across different groups (e.g., experimental conditions) for various items (i.e., differential response time item analysis). Furthermore, previous research has shown a complex relationship between response time and accuracy, necessitating a functional analysis to understand the patterns that manifest from this relationship. In this study, response time data are incorporated into an item response model for two purposes: (a) to examine how individuals' speed within an experimental condition affects their response accuracy on an item, and (b) to detect the differences in individuals' speed between conditions in the presence of within-condition effects. For these two purposes, by-variable smooth functions are employed to model differential and functional response time effects by experimental condition for each item. This model is illustrated using an empirical data set to describe the effect of individuals' speed on their reading comprehension ability in two experimental conditions of reading medium (paper vs. digital) by item. A simulation study showed that the recovery of parameters and by-variable smooth functions of response time was satisfactory, and that the type I error rate and power of the test for the by-variable smooth function of response time were acceptable in conditions similar to the empirical data set. In addition, the proposed method correctly identified the range of response time where between-condition differences in the effect of response time on the probability of a correct response were accurate.</p>","PeriodicalId":47871,"journal":{"name":"Journal of Educational Measurement","volume":"61 2","pages":"219-251"},"PeriodicalIF":1.3,"publicationDate":"2024-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/jedm.12389","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140563044","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Tae Yeon Kwon, A. Corinne Huggins-Manley, Jonathan Templin, Mingying Zheng
In classroom assessments, examinees can often answer test items multiple times, resulting in sequential multiple-attempt data. Sequential diagnostic classification models (DCMs) have been developed for such data. As student learning processes may be aligned with a hierarchy of measured traits, this study aimed to develop a sequential hierarchical DCM (sequential HDCM), which combines a sequential DCM with the HDCM, and investigate classification accuracy of the model in the presence of hierarchies when multiple attempts are allowed in dynamic assessment. We investigated the model's impact on classification accuracy when hierarchical structures are correctly specified, misspecified, or overspecified. The results indicate that (1) a sequential HDCM accurately classified students as masters and nonmasters when the data had a hierarchical structure; (2) a sequential HDCM produced similar or slightly higher classification accuracy than nonhierarchical sequential LCDM when the data had hierarchical structures; and (3) the misspecification of the hierarchical structure of the data resulted in lower classification accuracy when the misspecified model had fewer attribute profiles than the true model. We discuss limitations and make recommendations on using the proposed model in practice. This study provides practitioners with information about the possibilities for psychometric modeling of dynamic classroom assessment data.
{"title":"Modeling Hierarchical Attribute Structures in Diagnostic Classification Models with Multiple Attempts","authors":"Tae Yeon Kwon, A. Corinne Huggins-Manley, Jonathan Templin, Mingying Zheng","doi":"10.1111/jedm.12387","DOIUrl":"10.1111/jedm.12387","url":null,"abstract":"<p>In classroom assessments, examinees can often answer test items multiple times, resulting in sequential multiple-attempt data. Sequential diagnostic classification models (DCMs) have been developed for such data. As student learning processes may be aligned with a hierarchy of measured traits, this study aimed to develop a sequential hierarchical DCM (sequential HDCM), which combines a sequential DCM with the HDCM, and investigate classification accuracy of the model in the presence of hierarchies when multiple attempts are allowed in dynamic assessment. We investigated the model's impact on classification accuracy when hierarchical structures are correctly specified, misspecified, or overspecified. The results indicate that (1) a sequential HDCM accurately classified students as masters and nonmasters when the data had a hierarchical structure; (2) a sequential HDCM produced similar or slightly higher classification accuracy than nonhierarchical sequential LCDM when the data had hierarchical structures; and (3) the misspecification of the hierarchical structure of the data resulted in lower classification accuracy when the misspecified model had fewer attribute profiles than the true model. We discuss limitations and make recommendations on using the proposed model in practice. This study provides practitioners with information about the possibilities for psychometric modeling of dynamic classroom assessment data.</p>","PeriodicalId":47871,"journal":{"name":"Journal of Educational Measurement","volume":"61 2","pages":"198-218"},"PeriodicalIF":1.3,"publicationDate":"2024-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140562989","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Research has shown that multiple-indicator multiple-cause (MIMIC) models can result in inflated Type I error rates in detecting differential item functioning (DIF) when the assumption of equal latent variance is violated. This study explains how the violation of the equal variance assumption adversely impacts the detection of nonuniform DIF and how it can be addressed through moderated nonlinear factor analysis (MNLFA) model via Bayesian estimation approach to overcome limitations from the restrictive assumption. The Bayesian MNLFA approach suggested in this study better control Type I errors by freely estimating latent factor variances across different groups. Our experimentation with simulated data demonstrates that the BMNFA models outperform the existing MIMIC models, in terms of Type I error control as well as parameter recovery. The results suggest that the MNLFA models have the potential to be a superior choice to the existing MIMIC models, especially in situations where the assumption of equal latent variance assumption is not likely to hold.
研究表明,当违反潜在方差相等的假设时,多指标多原因(MIMIC)模型在检测差异项目功能(DIF)时可能会导致 I 类错误率上升。本研究解释了违反等方差假设如何对非均匀 DIF 的检测产生不利影响,以及如何通过贝叶斯估计方法的调节非线性因素分析(MNLFA)模型来克服限制性假设的局限性。本研究提出的贝叶斯 MNLFA 方法通过自由估计不同组的潜在因子方差,更好地控制了 I 类误差。我们用模拟数据进行的实验表明,BMNFA 模型在 I 类误差控制和参数恢复方面优于现有的 MIMIC 模型。结果表明,MNLFA 模型有可能成为优于现有 MIMIC 模型的选择,尤其是在等潜方差假设不可能成立的情况下。
{"title":"A Bayesian Moderated Nonlinear Factor Analysis Approach for DIF Detection under Violation of the Equal Variance Assumption","authors":"Sooyong Lee, Suhwa Han, Seung W. Choi","doi":"10.1111/jedm.12388","DOIUrl":"10.1111/jedm.12388","url":null,"abstract":"<p>Research has shown that multiple-indicator multiple-cause (MIMIC) models can result in inflated Type I error rates in detecting differential item functioning (DIF) when the assumption of equal latent variance is violated. This study explains how the violation of the equal variance assumption adversely impacts the detection of nonuniform DIF and how it can be addressed through moderated nonlinear factor analysis (MNLFA) model via Bayesian estimation approach to overcome limitations from the restrictive assumption. The Bayesian MNLFA approach suggested in this study better control Type I errors by freely estimating latent factor variances across different groups. Our experimentation with simulated data demonstrates that the BMNFA models outperform the existing MIMIC models, in terms of Type I error control as well as parameter recovery. The results suggest that the MNLFA models have the potential to be a superior choice to the existing MIMIC models, especially in situations where the assumption of equal latent variance assumption is not likely to hold.</p>","PeriodicalId":47871,"journal":{"name":"Journal of Educational Measurement","volume":"61 2","pages":"303-324"},"PeriodicalIF":1.3,"publicationDate":"2024-03-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140153862","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}