Educational and Psychological Measurement最新文献_第8页

Evaluating the Quality of Classification in Mixture Model Simulations. 评估混合模型模拟中的分类质量。

IF 2.7 3区心理学 Q2 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS

Educational and Psychological Measurement

Pub Date : 2023-04-01 Epub Date: 2022-04-29 DOI: 10.1177/00131644221093619

Yoona Jang, Sehee Hong

The purpose of this study was to evaluate the degree of classification quality in the basic latent class model when covariates are either included or are not included in the model. To accomplish this task, Monte Carlo simulations were conducted in which the results of models with and without a covariate were compared. Based on these simulations, it was determined that models without a covariate better predicted the number of classes. These findings in general supported the use of the popular three-step approach; with its quality of classification determined to be more than 70% under various conditions of covariate effect, sample size, and quality of indicators. In light of these findings, the practical utility of evaluating classification quality is discussed relative to issues that applied researchers need to carefully consider when applying latent class models.

本研究的目的是评估当模型中包含或不包含协变量时，基本潜类模型的分类质量程度。为了完成这项任务，我们进行了蒙特卡罗模拟，对包含和不包含协变量的模型结果进行了比较。模拟结果表明，不包含协变量的模型能更好地预测类别数。这些发现总体上支持使用流行的三步法；在协变量效应、样本大小和指标质量等不同条件下，其分类质量被确定为超过 70%。根据这些发现，我们讨论了评估分类质量的实际效用，以及应用研究人员在应用潜类模型时需要仔细考虑的问题。

引用次数: 0

Supervised Classes, Unsupervised Mixing Proportions: Detection of Bots in a Likert-Type Questionnaire. 监督类，非监督混合比例：检测李克特类型问卷中的机器人。

IF 2.7 3区心理学 Q2 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS

Educational and Psychological Measurement

Pub Date : 2023-04-01 Epub Date: 2022-07-30 DOI: 10.1177/00131644221104220

Michael John Ilagan, Carl F Falk

Administering Likert-type questionnaires to online samples risks contamination of the data by malicious computer-generated random responses, also known as bots. Although nonresponsivity indices (NRIs) such as person-total correlations or Mahalanobis distance have shown great promise to detect bots, universal cutoff values are elusive. An initial calibration sample constructed via stratified sampling of bots and humans-real or simulated under a measurement model-has been used to empirically choose cutoffs with a high nominal specificity. However, a high-specificity cutoff is less accurate when the target sample has a high contamination rate. In the present article, we propose the supervised classes, unsupervised mixing proportions (SCUMP) algorithm that chooses a cutoff to maximize accuracy. SCUMP uses a Gaussian mixture model to estimate, unsupervised, the contamination rate in the sample of interest. A simulation study found that, in the absence of model misspecification on the bots, our cutoffs maintained accuracy across varying contamination rates.

向在线样本发放李克特（Likert）类型的调查问卷有可能会被计算机随机生成的恶意回答（也称为机器人）污染数据。虽然人-总相关性或马哈拉诺比斯距离等非反应性指数（NRI）在检测机器人方面显示出了巨大的潜力，但通用的截止值却难以捉摸。通过在测量模型下对机器人和人类--真实的或模拟的--进行分层抽样而构建的初始校准样本，已被用于根据经验选择具有高名义特异性的临界值。然而，当目标样本的污染率较高时，高特异性截止值的准确性就会降低。在本文中，我们提出了监督类、无监督混合比例（SCUMP）算法，该算法可选择最大化准确性的截止值。SCUMP 采用高斯混合物模型，在无监督的情况下估计相关样本的污染率。一项模拟研究发现，在没有对机器人模型进行错误规范的情况下，我们的截断值在不同的污染率下都能保持准确性。

{"title":"Supervised Classes, Unsupervised Mixing Proportions: Detection of Bots in a Likert-Type Questionnaire.","authors":"Michael John Ilagan, Carl F Falk","doi":"10.1177/00131644221104220","DOIUrl":"10.1177/00131644221104220","url":null,"abstract":"Administering Likert-type questionnaires to online samples risks contamination of the data by malicious computer-generated random responses, also known as bots. Although nonresponsivity indices (NRIs) such as person-total correlations or Mahalanobis distance have shown great promise to detect bots, universal cutoff values are elusive. An initial calibration sample constructed via stratified sampling of bots and humans-real or simulated under a measurement model-has been used to empirically choose cutoffs with a high nominal specificity. However, a high-specificity cutoff is less accurate when the target sample has a high contamination rate. In the present article, we propose the supervised classes, unsupervised mixing proportions (SCUMP) algorithm that chooses a cutoff to maximize accuracy. SCUMP uses a Gaussian mixture model to estimate, unsupervised, the contamination rate in the sample of interest. A simulation study found that, in the absence of model misspecification on the bots, our cutoffs maintained accuracy across varying contamination rates.","PeriodicalId":11502,"journal":{"name":"Educational and Psychological Measurement","volume":"83 2","pages":"217-239"},"PeriodicalIF":2.7,"publicationDate":"2023-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9972131/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10823907","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Implementing a Standardized Effect Size in the POLYSIBTEST Procedure. 在 POLYSIBTEST 程序中实施标准化效应大小。

IF 2.7 3区心理学 Q2 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS

Educational and Psychological Measurement

Pub Date : 2023-04-01 Epub Date: 2022-02-28 DOI: 10.1177/00131644221081011

James D Weese, Ronna C Turner, Xinya Liang, Allison Ames, Brandon Crawford

A study was conducted to implement the use of a standardized effect size and corresponding classification guidelines for polytomous data with the POLYSIBTEST procedure and compare those guidelines with prior recommendations. Two simulation studies were included. The first identifies new unstandardized test heuristics for classifying moderate and large differential item functioning (DIF) for polytomous response data with three to seven response options. These are provided for researchers studying polytomous data using POLYSIBTEST software that has been published previously. The second simulation study provides one pair of standardized effect size heuristics that can be employed with items having any number of response options and compares true-positive and false-positive rates for the standardized effect size proposed by Weese with one proposed by Zwick et al. and two unstandardized classification procedures (Gierl; Golia). All four procedures retained false-positive rates generally below the level of significance at both moderate and large DIF levels. However, Weese's standardized effect size was not affected by sample size and provided slightly higher true-positive rates than the Zwick et al. and Golia's recommendations, while flagging substantially fewer items that might be characterized as having negligible DIF when compared with Gierl's suggested criterion. The proposed effect size allows for easier use and interpretation by practitioners as it can be applied to items with any number of response options and is interpreted as a difference in standard deviation units.

我们开展了一项研究，利用 POLYSIBTEST 程序对多态数据使用标准化效应大小和相应的分类指南，并将这些指南与之前的建议进行比较。其中包括两项模拟研究。第一项研究确定了新的非标准化测试启发式方法，用于对具有三到七个响应选项的多项式响应数据的中度和高度差异项目功能（DIF）进行分类。这些启发式是为使用 POLYSIBTEST 软件研究多项式数据的研究人员提供的，该软件已于之前发布。第二项模拟研究提供了一对标准化效应大小启发式方法，可用于具有任意数量回答选项的项目，并比较了 Weese 提出的标准化效应大小的真阳性率和假阳性率，以及 Zwick 等人提出的标准化效应大小和两种非标准化分类程序（Gierl；Golia）的真阳性率和假阳性率。在中等和较大的 DIF 水平下，所有四种程序的假阳性率一般都低于显著性水平。不过，Weese 的标准化效应大小不受样本量的影响，其真阳性率略高于 Zwick 等人和 Golia 的建议，同时与 Gierl 建议的标准相比，标记出的可忽略 DIF 的项目要少得多。建议的效应大小更便于从业人员使用和解释，因为它可以应用于具有任意数量回答选项的项目，并解释为标准差单位的差异。

{"title":"Implementing a Standardized Effect Size in the POLYSIBTEST Procedure.","authors":"James D Weese, Ronna C Turner, Xinya Liang, Allison Ames, Brandon Crawford","doi":"10.1177/00131644221081011","DOIUrl":"10.1177/00131644221081011","url":null,"abstract":"A study was conducted to implement the use of a standardized effect size and corresponding classification guidelines for polytomous data with the POLYSIBTEST procedure and compare those guidelines with prior recommendations. Two simulation studies were included. The first identifies new unstandardized test heuristics for classifying moderate and large differential item functioning (DIF) for polytomous response data with three to seven response options. These are provided for researchers studying polytomous data using POLYSIBTEST software that has been published previously. The second simulation study provides one pair of standardized effect size heuristics that can be employed with items having any number of response options and compares true-positive and false-positive rates for the standardized effect size proposed by Weese with one proposed by Zwick et al. and two unstandardized classification procedures (Gierl; Golia). All four procedures retained false-positive rates generally below the level of significance at both moderate and large DIF levels. However, Weese's standardized effect size was not affected by sample size and provided slightly higher true-positive rates than the Zwick et al. and Golia's recommendations, while flagging substantially fewer items that might be characterized as having negligible DIF when compared with Gierl's suggested criterion. The proposed effect size allows for easier use and interpretation by practitioners as it can be applied to items with any number of response options and is interpreted as a difference in standard deviation units.","PeriodicalId":11502,"journal":{"name":"Educational and Psychological Measurement","volume":"83 2","pages":"401-427"},"PeriodicalIF":2.7,"publicationDate":"2023-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9972129/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10823908","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Summary Intervals for Model-Based Classification Accuracy and Consistency Indices. 基于模型的分类精度和一致性指标的汇总区间。

IF 2.7 3区心理学 Q2 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS

Educational and Psychological Measurement

Pub Date : 2023-04-01 DOI: 10.1177/00131644221092347

Oscar Gonzalez

When scores are used to make decisions about respondents, it is of interest to estimate classification accuracy (CA), the probability of making a correct decision, and classification consistency (CC), the probability of making the same decision across two parallel administrations of the measure. Model-based estimates of CA and CC computed from the linear factor model have been recently proposed, but parameter uncertainty of the CA and CC indices has not been investigated. This article demonstrates how to estimate percentile bootstrap confidence intervals and Bayesian credible intervals for CA and CC indices, which have the added benefit of incorporating the sampling variability of the parameters of the linear factor model to summary intervals. Results from a small simulation study suggest that percentile bootstrap confidence intervals have appropriate confidence interval coverage, although displaying a small negative bias. However, Bayesian credible intervals with diffused priors have poor interval coverage, but their coverage improves once empirical, weakly informative priors are used. The procedures are illustrated by estimating CA and CC indices from a measure used to identify individuals low on mindfulness for a hypothetical intervention, and R code is provided to facilitate the implementation of the procedures.

当使用分数对应答者做出决策时，估计分类准确性(CA)、做出正确决策的概率和分类一致性(CC)是很有意义的，分类一致性是在两个平行的度量管理中做出相同决策的概率。近年来，人们提出了基于模型的CA和CC估计方法，但CA和CC指标的参数不确定性尚未得到研究。本文演示了如何估计CA和CC指数的百分位自举置信区间和贝叶斯可信区间，它们具有将线性因子模型参数的抽样可变性纳入汇总区间的额外好处。一项小型模拟研究的结果表明，虽然显示出小的负偏差，但百分位数自举置信区间具有适当的置信区间覆盖。然而，具有扩散先验的贝叶斯可信区间具有较差的区间覆盖率，但一旦使用经验的、弱信息的先验，它们的覆盖率就会提高。通过估计CA和CC指数来说明这些程序，这些指数来自一种用于识别低正念个体的假设干预措施，并提供R代码以促进程序的实施。

{"title":"Summary Intervals for Model-Based Classification Accuracy and Consistency Indices.","authors":"Oscar Gonzalez","doi":"10.1177/00131644221092347","DOIUrl":"https://doi.org/10.1177/00131644221092347","url":null,"abstract":"When scores are used to make decisions about respondents, it is of interest to estimate classification accuracy (CA), the probability of making a correct decision, and classification consistency (CC), the probability of making the same decision across two parallel administrations of the measure. Model-based estimates of CA and CC computed from the linear factor model have been recently proposed, but parameter uncertainty of the CA and CC indices has not been investigated. This article demonstrates how to estimate percentile bootstrap confidence intervals and Bayesian credible intervals for CA and CC indices, which have the added benefit of incorporating the sampling variability of the parameters of the linear factor model to summary intervals. Results from a small simulation study suggest that percentile bootstrap confidence intervals have appropriate confidence interval coverage, although displaying a small negative bias. However, Bayesian credible intervals with diffused priors have poor interval coverage, but their coverage improves once empirical, weakly informative priors are used. The procedures are illustrated by estimating CA and CC indices from a measure used to identify individuals low on mindfulness for a hypothetical intervention, and R code is provided to facilitate the implementation of the procedures.","PeriodicalId":11502,"journal":{"name":"Educational and Psychological Measurement","volume":"83 2","pages":"240-261"},"PeriodicalIF":2.7,"publicationDate":"2023-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9972125/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10823910","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

A New Stopping Criterion for Rasch Trees Based on the Mantel-Haenszel Effect Size Measure for Differential Item Functioning. 基于 Mantel-Haenszel 差异项目功能效应大小测量的 Rasch 树新停止标准。

IF 2.1 3区心理学 Q2 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS

Educational and Psychological Measurement

Pub Date : 2023-02-01 Epub Date: 2022-02-28 DOI: 10.1177/00131644221077135

Mirka Henninger, Rudolf Debelak, Carolin Strobl

To detect differential item functioning (DIF), Rasch trees search for optimal splitpoints in covariates and identify subgroups of respondents in a data-driven way. To determine whether and in which covariate a split should be performed, Rasch trees use statistical significance tests. Consequently, Rasch trees are more likely to label small DIF effects as significant in larger samples. This leads to larger trees, which split the sample into more subgroups. What would be more desirable is an approach that is driven more by effect size rather than sample size. In order to achieve this, we suggest to implement an additional stopping criterion: the popular Educational Testing Service (ETS) classification scheme based on the Mantel-Haenszel odds ratio. This criterion helps us to evaluate whether a split in a Rasch tree is based on a substantial or an ignorable difference in item parameters, and it allows the Rasch tree to stop growing when DIF between the identified subgroups is small. Furthermore, it supports identifying DIF items and quantifying DIF effect sizes in each split. Based on simulation results, we conclude that the Mantel-Haenszel effect size further reduces unnecessary splits in Rasch trees under the null hypothesis, or when the sample size is large but DIF effects are negligible. To make the stopping criterion easy-to-use for applied researchers, we have implemented the procedure in the statistical software R. Finally, we discuss how DIF effects between different nodes in a Rasch tree can be interpreted and emphasize the importance of purification strategies for the Mantel-Haenszel procedure on tree stopping and DIF item classification.

为了检测差异项目功能（DIF），拉氏树在协变量中搜索最佳分割点，并以数据驱动的方式识别受访者子群。Rasch 树使用统计显著性检验来确定是否以及在哪个协变量中进行拆分。因此，在较大样本中，Rasch 树更有可能将较小的 DIF 效应标注为显著。这就会产生更大的树，将样本分成更多的子组。更理想的方法是更多地由效应大小而不是样本大小驱动。为了实现这一目标，我们建议采用一种额外的停止标准：基于曼特尔-海恩泽尔几率比率的教育考试服务（ETS）分类计划。该标准可帮助我们评估 Rasch 树中的分叉是基于项目参数的实质性差异还是可忽略的差异，并允许 Rasch 树在已识别子组之间的 DIF 较小时停止增长。此外，它还支持识别 DIF 项目并量化每个分拆中的 DIF 效应大小。根据模拟结果，我们得出结论：在零假设下，或当样本量较大但 DIF 效应可忽略不计时，Mantel-Haenszel 效应大小可进一步减少 Rasch 树中不必要的拆分。最后，我们讨论了如何解释 Rasch 树中不同节点之间的 DIF 效应，并强调了 Mantel-Haenszel 程序的净化策略对于树停止和 DIF 项目分类的重要性。

{"title":"A New Stopping Criterion for Rasch Trees Based on the Mantel-Haenszel Effect Size Measure for Differential Item Functioning.","authors":"Mirka Henninger, Rudolf Debelak, Carolin Strobl","doi":"10.1177/00131644221077135","DOIUrl":"10.1177/00131644221077135","url":null,"abstract":"To detect differential item functioning (DIF), Rasch trees search for optimal splitpoints in covariates and identify subgroups of respondents in a data-driven way. To determine whether and in which covariate a split should be performed, Rasch trees use statistical significance tests. Consequently, Rasch trees are more likely to label small DIF effects as significant in larger samples. This leads to larger trees, which split the sample into more subgroups. What would be more desirable is an approach that is driven more by effect size rather than sample size. In order to achieve this, we suggest to implement an additional stopping criterion: the popular Educational Testing Service (ETS) classification scheme based on the Mantel-Haenszel odds ratio. This criterion helps us to evaluate whether a split in a Rasch tree is based on a substantial or an ignorable difference in item parameters, and it allows the Rasch tree to stop growing when DIF between the identified subgroups is small. Furthermore, it supports identifying DIF items and quantifying DIF effect sizes in each split. Based on simulation results, we conclude that the Mantel-Haenszel effect size further reduces unnecessary splits in Rasch trees under the null hypothesis, or when the sample size is large but DIF effects are negligible. To make the stopping criterion easy-to-use for applied researchers, we have implemented the procedure in the statistical software R. Finally, we discuss how DIF effects between different nodes in a Rasch tree can be interpreted and emphasize the importance of purification strategies for the Mantel-Haenszel procedure on tree stopping and DIF item classification.","PeriodicalId":11502,"journal":{"name":"Educational and Psychological Measurement","volume":"83 1","pages":"181-212"},"PeriodicalIF":2.1,"publicationDate":"2023-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9806517/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10489716","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Assessing Essential Unidimensionality of Scales and Structural Coefficient Bias. 评估量表的基本单维性和结构系数偏差。

IF 2.7 3区心理学 Q2 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS

Educational and Psychological Measurement

Pub Date : 2023-02-01 Epub Date: 2022-02-08 DOI: 10.1177/00131644221075580

Xiaoling Liu, Pei Cao, Xinzhen Lai, Jianbing Wen, Yanyun Yang

Percentage of uncontaminated correlations (PUC), explained common variance (ECV), and omega hierarchical (ω_H) have been used to assess the degree to which a scale is essentially unidimensional and to predict structural coefficient bias when a unidimensional measurement model is fit to multidimensional data. The usefulness of these indices has been investigated in the context of bifactor models with balanced structures. This study extends the examination by focusing on bifactor models with unbalanced structures. The maximum and minimum PUC values given the total number of items and factors were derived. The usefulness of PUC, ECV, and ω_H in predicting structural coefficient bias was examined under a variety of structural regression models with bifactor measurement components. Results indicated that the performance of these indices in predicting structural coefficient bias depended on whether the bifactor measurement model had a balanced or unbalanced structure. PUC failed to predict structural coefficient bias when the bifactor model had an unbalanced structure. ECV performed reasonably well, but worse than ω_H.

无污染相关百分比（PUC）、解释共同方差（ECV）和欧米茄分层（ωH）被用来评估量表本质上的单维程度，并预测单维测量模型与多维数据拟合时的结构系数偏差。这些指数的实用性已在具有平衡结构的双因素模型中进行了研究。本研究通过关注具有不平衡结构的双因素模型，扩展了研究范围。研究得出了项目和因子总数的最大和最小 PUC 值。在具有双因素测量成分的各种结构回归模型下，研究了 PUC、ECV 和 ωH 在预测结构系数偏差方面的实用性。结果表明，这些指数在预测结构系数偏差方面的表现取决于双因素测量模型是平衡结构还是非平衡结构。当双因素模型具有不平衡结构时，PUC 无法预测结构系数偏差。ECV 的表现尚可，但不如 ωH。

{"title":"Assessing Essential Unidimensionality of Scales and Structural Coefficient Bias.","authors":"Xiaoling Liu, Pei Cao, Xinzhen Lai, Jianbing Wen, Yanyun Yang","doi":"10.1177/00131644221075580","DOIUrl":"10.1177/00131644221075580","url":null,"abstract":"Percentage of uncontaminated correlations (PUC), explained common variance (ECV), and omega hierarchical (ωH) have been used to assess the degree to which a scale is essentially unidimensional and to predict structural coefficient bias when a unidimensional measurement model is fit to multidimensional data. The usefulness of these indices has been investigated in the context of bifactor models with balanced structures. This study extends the examination by focusing on bifactor models with unbalanced structures. The maximum and minimum PUC values given the total number of items and factors were derived. The usefulness of PUC, ECV, and ωH in predicting structural coefficient bias was examined under a variety of structural regression models with bifactor measurement components. Results indicated that the performance of these indices in predicting structural coefficient bias depended on whether the bifactor measurement model had a balanced or unbalanced structure. PUC failed to predict structural coefficient bias when the bifactor model had an unbalanced structure. ECV performed reasonably well, but worse than ωH.","PeriodicalId":11502,"journal":{"name":"Educational and Psychological Measurement","volume":"83 1","pages":"28-47"},"PeriodicalIF":2.7,"publicationDate":"2023-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9806515/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10489717","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Diagnostic Classification Model for Forced-Choice Items and Noncognitive Tests. 强迫选择项目和非认知测试的诊断分类模型。

IF 2.7 3区心理学 Q2 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS

Educational and Psychological Measurement

Pub Date : 2023-02-01 DOI: 10.1177/00131644211069906

Hung-Yu Huang

The forced-choice (FC) item formats used for noncognitive tests typically develop a set of response options that measure different traits and instruct respondents to make judgments among these options in terms of their preference to control the response biases that are commonly observed in normative tests. Diagnostic classification models (DCMs) can provide information regarding the mastery status of test takers on latent discrete variables and are more commonly used for cognitive tests employed in educational settings than for noncognitive tests. The purpose of this study is to develop a new class of DCM for FC items under the higher-order DCM framework to meet the practical demands of simultaneously controlling for response biases and providing diagnostic classification information. By conducting a series of simulations and calibrating the model parameters with a Bayesian estimation, the study shows that, in general, the model parameters can be recovered satisfactorily with the use of long tests and large samples. More attributes improve the precision of the second-order latent trait estimation in a long test, but decrease the classification accuracy and the estimation quality of the structural parameters. When statements are allowed to load on two distinct attributes in paired comparison items, the specific-attribute condition produces better a parameter estimation than the overlap-attribute condition. Finally, an empirical analysis related to work-motivation measures is presented to demonstrate the applications and implications of the new model.

用于非认知测试的强迫选择(FC)项目格式通常会开发一套测量不同特征的回答选项，并指导被调查者根据自己的偏好在这些选项中做出判断，以控制在规范测试中常见的反应偏差。诊断分类模型(dcm)可以提供有关考生对潜在离散变量的掌握状态的信息，并且更常用于教育环境中采用的认知测试而不是非认知测试。本研究的目的是在高阶DCM框架下，开发一类新的FC项目的DCM，以满足同时控制反应偏倚和提供诊断分类信息的实际需求。通过一系列的模拟和贝叶斯估计校正模型参数，研究表明，在一般情况下，使用长时间的试验和大样本，模型参数可以得到满意的恢复。在长时间测试中，属性的增加提高了二阶潜在特征估计的精度，但降低了分类精度和结构参数的估计质量。当允许语句在成对比较项中加载两个不同的属性时，特定属性条件比重叠属性条件产生更好的参数估计。最后，通过对工作激励措施的实证分析，展示了新模型的应用和意义。

{"title":"Diagnostic Classification Model for Forced-Choice Items and Noncognitive Tests.","authors":"Hung-Yu Huang","doi":"10.1177/00131644211069906","DOIUrl":"https://doi.org/10.1177/00131644211069906","url":null,"abstract":"The forced-choice (FC) item formats used for noncognitive tests typically develop a set of response options that measure different traits and instruct respondents to make judgments among these options in terms of their preference to control the response biases that are commonly observed in normative tests. Diagnostic classification models (DCMs) can provide information regarding the mastery status of test takers on latent discrete variables and are more commonly used for cognitive tests employed in educational settings than for noncognitive tests. The purpose of this study is to develop a new class of DCM for FC items under the higher-order DCM framework to meet the practical demands of simultaneously controlling for response biases and providing diagnostic classification information. By conducting a series of simulations and calibrating the model parameters with a Bayesian estimation, the study shows that, in general, the model parameters can be recovered satisfactorily with the use of long tests and large samples. More attributes improve the precision of the second-order latent trait estimation in a long test, but decrease the classification accuracy and the estimation quality of the structural parameters. When statements are allowed to load on two distinct attributes in paired comparison items, the specific-attribute condition produces better a parameter estimation than the overlap-attribute condition. Finally, an empirical analysis related to work-motivation measures is presented to demonstrate the applications and implications of the new model.","PeriodicalId":11502,"journal":{"name":"Educational and Psychological Measurement","volume":"83 1","pages":"146-180"},"PeriodicalIF":2.7,"publicationDate":"2023-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ftp.ncbi.nlm.nih.gov/pub/pmc/oa_pdf/5c/8c/10.1177_00131644211069906.PMC9806518.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10489721","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Using Simulated Annealing to Investigate Sensitivity of SEM to External Model Misspecification. 使用模拟退火法研究 SEM 对外部模型不规范的敏感性。

IF 2.7 3区心理学 Q2 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS

Educational and Psychological Measurement

Pub Date : 2023-02-01 Epub Date: 2022-01-31 DOI: 10.1177/00131644211073121

Charles L Fisk, Jeffrey R Harring, Zuchao Shen, Walter Leite, King Yiu Suen, Katerina M Marcoulides

Sensitivity analyses encompass a broad set of post-analytic techniques that are characterized as measuring the potential impact of any factor that has an effect on some output variables of a model. This research focuses on the utility of the simulated annealing algorithm to automatically identify path configurations and parameter values of omitted confounders in structural equation modeling (SEM). An empirical example based on a past published study is used to illustrate how strongly related an omitted variable must be to model variables for the conclusions of an analysis to change. The algorithm is outlined in detail and the results stemming from the sensitivity analysis are discussed.

敏感性分析包括一系列广泛的后分析技术，其特点是测量对模型的某些输出变量有影响的任何因素的潜在影响。本研究的重点是模拟退火算法在结构方程建模（SEM）中自动识别路径配置和遗漏混杂因素参数值的实用性。以过去发表的一项研究为基础，用一个实证例子说明了遗漏变量与模型变量之间必须有多大的关联才能改变分析结论。详细概述了算法，并讨论了敏感性分析的结果。

引用次数: 0

Croon's Bias-Corrected Estimation for Multilevel Structural Equation Models with Non-Normal Indicators and Model Misspecifications. 具有非正态性指标和模型失当的多层次结构方程模型的克罗恩偏差校正估计。

IF 2.7 3区心理学 Q2 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS

Educational and Psychological Measurement

Pub Date : 2023-02-01 Epub Date: 2022-03-11 DOI: 10.1177/00131644221080451

Kyle Cox, Benjamin Kelcey

Multilevel structural equation models (MSEMs) are well suited for educational research because they accommodate complex systems involving latent variables in multilevel settings. Estimation using Croon's bias-corrected factor score (BCFS) path estimation has recently been extended to MSEMs and demonstrated promise with limited sample sizes. This makes it well suited for planned educational research which often involves sample sizes constrained by logistical and financial factors. However, the performance of BCFS estimation with MSEMs has yet to be thoroughly explored under common but difficult conditions including in the presence of non-normal indicators and model misspecifications. We conducted two simulation studies to evaluate the accuracy and efficiency of the estimator under these conditions. Results suggest that BCFS estimation of MSEMs is often more dependable, more efficient, and less biased than other estimation approaches when sample sizes are limited or model misspecifications are present but is more susceptible to indicator non-normality. These results support, supplement, and elucidate previous literature describing the effective performance of BCFS estimation encouraging its utilization as an alternative or supplemental estimator for MSEMs.

多层次结构方程模型（MSEMs）非常适合教育研究，因为它们能在多层次环境中适应涉及潜变量的复杂系统。使用 Croon 的偏差校正因子得分（BCFS）路径估计最近已扩展到 MSEM，并在样本量有限的情况下显示出良好的前景。这使其非常适合计划中的教育研究，因为教育研究的样本量往往受到后勤和财务因素的限制。然而，在常见但困难的条件下，包括在非正态指标和模型规范错误的情况下，使用 MSEM 进行 BCFS 估计的性能还有待深入探讨。我们进行了两项模拟研究，以评估估计器在这些条件下的准确性和效率。结果表明，与其他估计方法相比，当样本量有限或存在模型失当时，BCFS 对 MSEM 的估计通常更可靠、更高效、偏差更小，但更容易受到指标非正态性的影响。这些结果支持、补充并阐明了之前描述 BCFS 估计有效性能的文献，鼓励将其用作 MSEM 的替代或补充估计方法。

{"title":"Croon's Bias-Corrected Estimation for Multilevel Structural Equation Models with Non-Normal Indicators and Model Misspecifications.","authors":"Kyle Cox, Benjamin Kelcey","doi":"10.1177/00131644221080451","DOIUrl":"10.1177/00131644221080451","url":null,"abstract":"Multilevel structural equation models (MSEMs) are well suited for educational research because they accommodate complex systems involving latent variables in multilevel settings. Estimation using Croon's bias-corrected factor score (BCFS) path estimation has recently been extended to MSEMs and demonstrated promise with limited sample sizes. This makes it well suited for planned educational research which often involves sample sizes constrained by logistical and financial factors. However, the performance of BCFS estimation with MSEMs has yet to be thoroughly explored under common but difficult conditions including in the presence of non-normal indicators and model misspecifications. We conducted two simulation studies to evaluate the accuracy and efficiency of the estimator under these conditions. Results suggest that BCFS estimation of MSEMs is often more dependable, more efficient, and less biased than other estimation approaches when sample sizes are limited or model misspecifications are present but is more susceptible to indicator non-normality. These results support, supplement, and elucidate previous literature describing the effective performance of BCFS estimation encouraging its utilization as an alternative or supplemental estimator for MSEMs.","PeriodicalId":11502,"journal":{"name":"Educational and Psychological Measurement","volume":"83 1","pages":"48-72"},"PeriodicalIF":2.7,"publicationDate":"2023-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9806522/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10489718","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Resolving Dimensionality in a Child Assessment Tool: An Application of the Multilevel Bifactor Model. 解决儿童评估工具中的维度问题：多层次双因素模型的应用。

IF 2.7 3区心理学 Q2 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS

Educational and Psychological Measurement

Pub Date : 2023-02-01 Epub Date: 2022-03-07 DOI: 10.1177/00131644221082688

Hope O Akaeze, Frank R Lawrence, Jamie Heng-Chieh Wu

Multidimensionality and hierarchical data structure are common in assessment data. These design features, if not accounted for, can threaten the validity of the results and inferences generated from factor analysis, a method frequently employed to assess test dimensionality. In this article, we describe and demonstrate the application of the multilevel bifactor model to address these features in examining test dimensionality. The tool for this exposition is the Child Observation Record Advantage 1.5 (COR-Adv1.5), a child assessment instrument widely used in Head Start programs. Previous studies on this assessment tool reported highly correlated factors and did not account for the nesting of children in classrooms. Results from this study show how the flexibility of the multilevel bifactor model, together with useful model-based statistics, can be harnessed to judge the dimensionality of a test instrument and inform the interpretability of the associated factor scores.

多维性和分层数据结构在测评数据中很常见。这些设计特征如果不加以考虑，就会威胁到因子分析（一种常用于评估测验维度的方法）所产生的结果和推论的有效性。在本文中，我们描述并演示了如何应用多层次双因素模型来解决这些问题。本文阐述的工具是儿童观察记录优势 1.5（COR-Adv1.5），这是一种广泛应用于启蒙项目的儿童评估工具。以前对这一评估工具的研究报告显示，该工具具有高度相关的因素，并且没有考虑到儿童在教室中的嵌套情况。这项研究的结果表明，如何利用多层次双因素模型的灵活性，以及基于模型的有用统计数据，来判断测试工具的维度，并为相关因素得分的可解释性提供信息。

引用次数: 0