首页 > 最新文献

Educational and Psychological Measurement最新文献

英文 中文
Enhancing the Detection of Social Desirability Bias Using Machine Learning: A Novel Application of Person-Fit Indices 利用机器学习加强对社会可取性偏见的检测:拟人指数的新应用
IF 2.7 3区 心理学 Q2 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS Pub Date : 2024-05-30 DOI: 10.1177/00131644241255109
Sanaz Nazari, Walter L. Leite, A. Corinne Huggins-Manley
Social desirability bias (SDB) is a common threat to the validity of conclusions from responses to a scale or survey. There is a wide range of person-fit statistics in the literature that can be employed to detect SDB. In addition, machine learning classifiers, such as logistic regression and random forest, have the potential to distinguish between biased and unbiased responses. This study proposes a new application of these classifiers to detect SDB by considering several person-fit indices as features or predictors in the machine learning methods. The results of a Monte Carlo simulation study showed that for a single feature, applying person-fit indices directly and logistic regression led to similar classification results. However, the random forest classifier improved the classification of biased and unbiased responses substantially. Classification was improved in both logistic regression and random forest by considering multiple features simultaneously. Moreover, cross-validation indicated stable area under the curves (AUCs) across machine learning classifiers. A didactical illustration of applying random forest to detect SDB is presented.
社会可取性偏差(SDB)是一种常见的威胁,会影响从量表或调查中得出的结论的有效性。文献中有多种拟人统计方法可用于检测 SDB。此外,机器学习分类器(如逻辑回归和随机森林)也有可能区分有偏见和无偏见的回答。本研究提出了将这些分类器应用于检测 SDB 的新方法,即在机器学习方法中考虑几个人称拟合指数作为特征或预测因子。蒙特卡罗模拟研究结果表明,对于单一特征,直接应用人称拟合指数和逻辑回归的分类结果相似。不过,随机森林分类器大大提高了有偏差和无偏差响应的分类效果。通过同时考虑多个特征,逻辑回归和随机森林分类器的分类效果都得到了改善。此外,交叉验证表明,各种机器学习分类器的曲线下面积(AUC)都很稳定。本文介绍了应用随机森林检测 SDB 的教学示例。
{"title":"Enhancing the Detection of Social Desirability Bias Using Machine Learning: A Novel Application of Person-Fit Indices","authors":"Sanaz Nazari, Walter L. Leite, A. Corinne Huggins-Manley","doi":"10.1177/00131644241255109","DOIUrl":"https://doi.org/10.1177/00131644241255109","url":null,"abstract":"Social desirability bias (SDB) is a common threat to the validity of conclusions from responses to a scale or survey. There is a wide range of person-fit statistics in the literature that can be employed to detect SDB. In addition, machine learning classifiers, such as logistic regression and random forest, have the potential to distinguish between biased and unbiased responses. This study proposes a new application of these classifiers to detect SDB by considering several person-fit indices as features or predictors in the machine learning methods. The results of a Monte Carlo simulation study showed that for a single feature, applying person-fit indices directly and logistic regression led to similar classification results. However, the random forest classifier improved the classification of biased and unbiased responses substantially. Classification was improved in both logistic regression and random forest by considering multiple features simultaneously. Moreover, cross-validation indicated stable area under the curves (AUCs) across machine learning classifiers. A didactical illustration of applying random forest to detect SDB is presented.","PeriodicalId":11502,"journal":{"name":"Educational and Psychological Measurement","volume":"2018 1","pages":""},"PeriodicalIF":2.7,"publicationDate":"2024-05-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141188132","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Is Effort Moderated Scoring Robust to Multidimensional Rapid Guessing? 努力程度调节评分对多维快速猜测是否稳健?
IF 2.7 3区 心理学 Q2 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS Pub Date : 2024-04-28 DOI: 10.1177/00131644241246749
Joseph A. Rios, Jiayi Deng
To mitigate the potential damaging consequences of rapid guessing (RG), a form of noneffortful responding, researchers have proposed a number of scoring approaches. The present simulation study examines the robustness of the most popular of these approaches, the unidimensional effort-moderated (EM) scoring procedure, to multidimensional RG (i.e., RG that is linearly related to examinee ability). Specifically, EM scoring is compared with the Holman–Glas (HG) method, a multidimensional scoring approach, in terms of model fit distortion, ability parameter recovery, and omega reliability distortion. Test difficulty, the proportion of RG present within a sample, and the strength of association between ability and RG propensity were manipulated to create 80 total conditions. Overall, the results showed that EM scoring provided improved model fit compared with HG scoring when RG comprised 12% or less of all item responses. Furthermore, no significant differences in ability parameter recovery and omega reliability distortion were noted when comparing these two scoring approaches under moderate degrees of RG multidimensionality. These limited differences were largely due to the limited impact of RG on aggregated ability (bias ranged from 0.00 to 0.05 logits) and reliability (distortion was ≤ .005 units) estimates when as much as 40% of item responses in the sample data reflected RG behavior.
快速猜测(RG)是一种非努力反应形式,为了减轻快速猜测的潜在破坏性后果,研究人员提出了许多评分方法。本模拟研究考察了这些方法中最流行的单维努力调解(EM)计分程序对多维 RG(即与考生能力呈线性关系的 RG)的稳健性。具体来说,EM 计分与 Holman-Glas(HG)方法(一种多维计分方法)在模型拟合失真、能力参数恢复和欧米茄信度失真方面进行了比较。测试难度、样本中出现 RG 的比例以及能力与 RG 倾向之间的关联强度受到操纵,共产生了 80 种条件。总体而言,研究结果表明,当 RG 占所有项目回答的 12% 或更少时,与 HG 评分相比,EM 评分的模型拟合度更高。此外,在中等程度的 RG 多维性条件下,比较这两种计分方法,在能力参数恢复和欧米茄信度失真方面没有发现明显差异。这些有限的差异主要是由于当样本数据中有多达 40% 的项目回答反映了 RG 行为时,RG 对综合能力(偏差范围在 0.00 至 0.05 logits 之间)和可靠性(失真度小于 0.005 单位)估计值的影响有限。
{"title":"Is Effort Moderated Scoring Robust to Multidimensional Rapid Guessing?","authors":"Joseph A. Rios, Jiayi Deng","doi":"10.1177/00131644241246749","DOIUrl":"https://doi.org/10.1177/00131644241246749","url":null,"abstract":"To mitigate the potential damaging consequences of rapid guessing (RG), a form of noneffortful responding, researchers have proposed a number of scoring approaches. The present simulation study examines the robustness of the most popular of these approaches, the unidimensional effort-moderated (EM) scoring procedure, to multidimensional RG (i.e., RG that is linearly related to examinee ability). Specifically, EM scoring is compared with the Holman–Glas (HG) method, a multidimensional scoring approach, in terms of model fit distortion, ability parameter recovery, and omega reliability distortion. Test difficulty, the proportion of RG present within a sample, and the strength of association between ability and RG propensity were manipulated to create 80 total conditions. Overall, the results showed that EM scoring provided improved model fit compared with HG scoring when RG comprised 12% or less of all item responses. Furthermore, no significant differences in ability parameter recovery and omega reliability distortion were noted when comparing these two scoring approaches under moderate degrees of RG multidimensionality. These limited differences were largely due to the limited impact of RG on aggregated ability (bias ranged from 0.00 to 0.05 logits) and reliability (distortion was ≤ .005 units) estimates when as much as 40% of item responses in the sample data reflected RG behavior.","PeriodicalId":11502,"journal":{"name":"Educational and Psychological Measurement","volume":"11 1","pages":""},"PeriodicalIF":2.7,"publicationDate":"2024-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140810537","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Comparing Accuracy of Parallel Analysis and Fit Statistics for Estimating the Number of Factors With Ordered Categorical Data in Exploratory Factor Analysis 比较平行分析和拟合统计在探索性因子分析中估计有序分类数据的因子数的准确性
IF 2.7 3区 心理学 Q2 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS Pub Date : 2024-04-17 DOI: 10.1177/00131644241240435
Hyunjung Lee, Heining Cham
Determining the number of factors in exploratory factor analysis (EFA) is crucial because it affects the rest of the analysis and the conclusions of the study. Researchers have developed various methods for deciding the number of factors to retain in EFA, but this remains one of the most difficult decisions in the EFA. The purpose of this study is to compare the parallel analysis with the performance of fit indices that researchers have started using as another strategy for determining the optimal number of factors in EFA. The Monte Carlo simulation was conducted with ordered categorical items because there are mixed results in previous simulation studies, and ordered categorical items are common in behavioral science. The results of this study indicate that the parallel analysis and the root mean square error of approximation (RMSEA) performed well in most conditions, followed by the Tucker–Lewis index (TLI) and then by the comparative fit index (CFI). The robust corrections of CFI, TLI, and RMSEA performed better in detecting misfit underfactored models than the original fit indices. However, they did not produce satisfactory results in dichotomous data with a small sample size. Implications, limitations of this study, and future research directions are discussed.
在探索性因素分析(EFA)中确定因素的数量至关重要,因为它影响到分析的其余部分和研究的结论。研究人员开发了各种方法来决定 EFA 中应保留的因子数量,但这仍然是 EFA 中最难做出的决定之一。本研究的目的是比较平行分析与拟合指数的性能,研究人员已开始将拟合指数作为确定 EFA 中最佳因子数的另一种策略。蒙特卡洛模拟采用了有序分类项目,因为以往的模拟研究结果不一,而且有序分类项目在行为科学中很常见。研究结果表明,平行分析和均方根近似误差(RMSEA)在大多数情况下表现良好,其次是塔克-刘易斯指数(TLI),然后是比较拟合指数(CFI)。与原始拟合指数相比,CFI、TLI 和 RMSEA 的稳健修正在检测误拟合模型方面表现更好。然而,在样本量较小的二分数据中,它们并没有产生令人满意的结果。本文讨论了本研究的意义、局限性和未来的研究方向。
{"title":"Comparing Accuracy of Parallel Analysis and Fit Statistics for Estimating the Number of Factors With Ordered Categorical Data in Exploratory Factor Analysis","authors":"Hyunjung Lee, Heining Cham","doi":"10.1177/00131644241240435","DOIUrl":"https://doi.org/10.1177/00131644241240435","url":null,"abstract":"Determining the number of factors in exploratory factor analysis (EFA) is crucial because it affects the rest of the analysis and the conclusions of the study. Researchers have developed various methods for deciding the number of factors to retain in EFA, but this remains one of the most difficult decisions in the EFA. The purpose of this study is to compare the parallel analysis with the performance of fit indices that researchers have started using as another strategy for determining the optimal number of factors in EFA. The Monte Carlo simulation was conducted with ordered categorical items because there are mixed results in previous simulation studies, and ordered categorical items are common in behavioral science. The results of this study indicate that the parallel analysis and the root mean square error of approximation (RMSEA) performed well in most conditions, followed by the Tucker–Lewis index (TLI) and then by the comparative fit index (CFI). The robust corrections of CFI, TLI, and RMSEA performed better in detecting misfit underfactored models than the original fit indices. However, they did not produce satisfactory results in dichotomous data with a small sample size. Implications, limitations of this study, and future research directions are discussed.","PeriodicalId":11502,"journal":{"name":"Educational and Psychological Measurement","volume":"1 1","pages":""},"PeriodicalIF":2.7,"publicationDate":"2024-04-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140616292","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Exploring the Influence of Response Styles on Continuous Scale Assessments: Insights From a Novel Modeling Approach 探索连续量表评估中反应风格的影响:新颖建模方法的启示
IF 2.7 3区 心理学 Q2 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS Pub Date : 2024-04-17 DOI: 10.1177/00131644241242789
Hung-Yu Huang
The use of discrete categorical formats to assess psychological traits has a long-standing tradition that is deeply embedded in item response theory models. The increasing prevalence and endorsement of computer- or web-based testing has led to greater focus on continuous response formats, which offer numerous advantages in both respondent experience and methodological considerations. Response styles, which are frequently observed in self-reported data, reflect a propensity to answer questionnaire items in a consistent manner, regardless of the item content. These response styles have been identified as causes of skewed scale scores and biased trait inferences. In this study, we investigate the impact of response styles on individuals’ responses within a continuous scale context, with a specific emphasis on extreme response style (ERS) and acquiescence response style (ARS). Building upon the established continuous response model (CRM), we propose extensions known as the CRM-ERS and CRM-ARS. These extensions are employed to quantitatively capture individual variations in these distinct response styles. The effectiveness of the proposed models was evaluated through a series of simulation studies. Bayesian methods were employed to effectively calibrate the model parameters. The results demonstrate that both models achieve satisfactory parameter recovery. Neglecting the effects of response styles led to biased estimation, underscoring the importance of accounting for these effects. Moreover, the estimation accuracy improved with increasing test length and sample size. An empirical analysis is presented to elucidate the practical applications and implications of the proposed models.
使用离散的分类形式来评估心理特征有着悠久的传统,这种传统深深植根于项目反应理论模型之中。随着计算机或网络测试的日益普及和认可,人们开始更多地关注连续反应形式,因为连续反应形式在被调查者体验和方法学考虑方面都有很多优势。在自我报告数据中经常出现的应答方式,反映了一种倾向,即无论项目内容如何,都以一致的方式回答问卷项目。这些回答方式被认为是造成量表评分偏差和特质推断偏差的原因。在本研究中,我们调查了在连续量表情境下,反应风格对个人反应的影响,并特别强调了极端反应风格(ERS)和默许反应风格(ARS)。在已建立的连续反应模型(CRM)的基础上,我们提出了称为 CRM-ERS 和 CRM-ARS 的扩展模型。这些扩展用于定量捕捉这些不同反应风格的个体差异。我们通过一系列模拟研究评估了所建议模型的有效性。采用贝叶斯方法对模型参数进行了有效校准。结果表明,这两个模型都实现了令人满意的参数恢复。忽略反应风格的影响会导致估计偏差,这突出了考虑这些影响的重要性。此外,随着测试时间和样本量的增加,估计精度也有所提高。本文通过实证分析阐明了所提模型的实际应用和意义。
{"title":"Exploring the Influence of Response Styles on Continuous Scale Assessments: Insights From a Novel Modeling Approach","authors":"Hung-Yu Huang","doi":"10.1177/00131644241242789","DOIUrl":"https://doi.org/10.1177/00131644241242789","url":null,"abstract":"The use of discrete categorical formats to assess psychological traits has a long-standing tradition that is deeply embedded in item response theory models. The increasing prevalence and endorsement of computer- or web-based testing has led to greater focus on continuous response formats, which offer numerous advantages in both respondent experience and methodological considerations. Response styles, which are frequently observed in self-reported data, reflect a propensity to answer questionnaire items in a consistent manner, regardless of the item content. These response styles have been identified as causes of skewed scale scores and biased trait inferences. In this study, we investigate the impact of response styles on individuals’ responses within a continuous scale context, with a specific emphasis on extreme response style (ERS) and acquiescence response style (ARS). Building upon the established continuous response model (CRM), we propose extensions known as the CRM-ERS and CRM-ARS. These extensions are employed to quantitatively capture individual variations in these distinct response styles. The effectiveness of the proposed models was evaluated through a series of simulation studies. Bayesian methods were employed to effectively calibrate the model parameters. The results demonstrate that both models achieve satisfactory parameter recovery. Neglecting the effects of response styles led to biased estimation, underscoring the importance of accounting for these effects. Moreover, the estimation accuracy improved with increasing test length and sample size. An empirical analysis is presented to elucidate the practical applications and implications of the proposed models.","PeriodicalId":11502,"journal":{"name":"Educational and Psychological Measurement","volume":"35 1","pages":""},"PeriodicalIF":2.7,"publicationDate":"2024-04-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140617770","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The Impact of Insufficient Effort Responses on the Order of Category Thresholds in the Polytomous Rasch Model 多变量 Rasch 模型中努力不足的反应对类别阈值顺序的影响
IF 2.7 3区 心理学 Q2 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS Pub Date : 2024-04-13 DOI: 10.1177/00131644241242806
Kuan-Yu Jin, Thomas Eckes
Insufficient effort responding (IER) refers to a lack of effort when answering survey or questionnaire items. Such items typically offer more than two ordered response categories, with Likert-type scales as the most prominent example. The underlying assumption is that the successive categories reflect increasing levels of the latent variable assessed. This research investigates how IER affects the intended category order of Likert-type scales, focusing on the category thresholds in the polytomous Rasch model. In a simulation study, we examined several IER patterns in datasets generated from the mixture model for IER (MMIER). The key findings were (a) random responding and overusing the non-extreme categories of a five-category scale were each associated with high frequencies of disordered category thresholds; (b) raising the IER rate from 5% to 10% led to a substantial increase in threshold disordering, particularly among easy and difficult items; (c) narrow distances between adjacent categories (0.5 logits) were associated with more frequent disordering, compared with wide distances (1.0 logits). Two real-data examples highlighted the efficiency and utility of the MMIER for detecting latent classes of respondents exhibiting different forms of IER. Under the MMIER, the frequency of disordered thresholds was reduced substantially in both examples. The discussion focuses on the practical implications of using the MMIER in survey research and points to directions for future research.
不充分努力回答 (IER) 是指在回答调查或问卷项目时不努力。这类项目通常提供两个以上的有序回答类别,最突出的例子就是李克特量表。其基本假设是,连续的类别反映了所评估的潜在变量的递增水平。本研究调查了 IER 如何影响李克特量表的预期类别顺序,重点是多态 Rasch 模型中的类别阈值。在一项模拟研究中,我们考察了由 IER 混合模型(MMIER)生成的数据集中的几种 IER 模式。主要发现有:(a) 随机应答和过度使用五类量表中的非极端类别都与类别阈值失调的高频率有关;(b) 将 IER 率从 5% 提高到 10% 会导致阈值失调的大幅增加,尤其是在容易和困难的项目中;(c) 相邻类别之间的距离窄(0.5 logits)与距离宽(1.0 logits)相比与更频繁的失调有关。两个真实数据实例凸显了多因素误差分析法在检测表现出不同形式 IER 的潜在受访者类别方面的效率和实用性。在 MMIER 的作用下,两个例子中无序阈值的频率都大幅降低。讨论的重点是在调查研究中使用 MMIER 的实际意义,并指出了未来研究的方向。
{"title":"The Impact of Insufficient Effort Responses on the Order of Category Thresholds in the Polytomous Rasch Model","authors":"Kuan-Yu Jin, Thomas Eckes","doi":"10.1177/00131644241242806","DOIUrl":"https://doi.org/10.1177/00131644241242806","url":null,"abstract":"Insufficient effort responding (IER) refers to a lack of effort when answering survey or questionnaire items. Such items typically offer more than two ordered response categories, with Likert-type scales as the most prominent example. The underlying assumption is that the successive categories reflect increasing levels of the latent variable assessed. This research investigates how IER affects the intended category order of Likert-type scales, focusing on the category thresholds in the polytomous Rasch model. In a simulation study, we examined several IER patterns in datasets generated from the mixture model for IER (MMIER). The key findings were (a) random responding and overusing the non-extreme categories of a five-category scale were each associated with high frequencies of disordered category thresholds; (b) raising the IER rate from 5% to 10% led to a substantial increase in threshold disordering, particularly among easy and difficult items; (c) narrow distances between adjacent categories (0.5 logits) were associated with more frequent disordering, compared with wide distances (1.0 logits). Two real-data examples highlighted the efficiency and utility of the MMIER for detecting latent classes of respondents exhibiting different forms of IER. Under the MMIER, the frequency of disordered thresholds was reduced substantially in both examples. The discussion focuses on the practical implications of using the MMIER in survey research and points to directions for future research.","PeriodicalId":11502,"journal":{"name":"Educational and Psychological Measurement","volume":"116 1","pages":""},"PeriodicalIF":2.7,"publicationDate":"2024-04-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140581547","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Latent Variable Forests for Latent Variable Score Estimation 用于潜在变量分数估计的潜在变量森林
IF 2.7 3区 心理学 Q2 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS Pub Date : 2024-04-01 DOI: 10.1177/00131644241237502
Franz Classe, Christoph Kern
We develop a latent variable forest (LV Forest) algorithm for the estimation of latent variable scores with one or more latent variables. LV Forest estimates unbiased latent variable scores based on confirmatory factor analysis (CFA) models with ordinal and/or numerical response variables. Through parametric model restrictions paired with a nonparametric tree-based machine learning approach, LV Forest estimates latent variable scores using models that are unbiased with respect to relevant subgroups in the population. This way, estimated latent variable scores are interpretable with respect to systematic influences of covariates without being biased by these variables. By building a tree ensemble, LV Forest takes parameter heterogeneity in latent variable modeling into account to capture subgroups with both good model fit and stable parameter estimates. We apply LV Forest to simulated data with heterogeneous model parameters as well as to real large-scale survey data. We show that LV Forest improves the accuracy of score estimation if parameter heterogeneity is present.
我们开发了一种潜变量森林(LV Forest)算法,用于估算具有一个或多个潜变量的潜变量得分。LV Forest 基于带有序数和/或数字响应变量的确证因子分析(CFA)模型估算无偏潜变量得分。通过参数模型限制与基于树的非参数机器学习方法的搭配,LV Forest 利用模型估算潜变量得分,这些模型对人群中的相关子群是无偏的。这样,估算出的潜在变量得分就可以解释协变量的系统性影响,而不会受到这些变量的影响。通过构建树状集合,LV Forest 将潜变量建模中的参数异质性考虑在内,从而捕捉到模型拟合度高、参数估计值稳定的亚群。我们将 LV Forest 应用于具有异质性模型参数的模拟数据以及真实的大规模调查数据。我们的研究表明,如果存在参数异质性,LV Forest 可以提高分数估计的准确性。
{"title":"Latent Variable Forests for Latent Variable Score Estimation","authors":"Franz Classe, Christoph Kern","doi":"10.1177/00131644241237502","DOIUrl":"https://doi.org/10.1177/00131644241237502","url":null,"abstract":"We develop a latent variable forest (LV Forest) algorithm for the estimation of latent variable scores with one or more latent variables. LV Forest estimates unbiased latent variable scores based on confirmatory factor analysis (CFA) models with ordinal and/or numerical response variables. Through parametric model restrictions paired with a nonparametric tree-based machine learning approach, LV Forest estimates latent variable scores using models that are unbiased with respect to relevant subgroups in the population. This way, estimated latent variable scores are interpretable with respect to systematic influences of covariates without being biased by these variables. By building a tree ensemble, LV Forest takes parameter heterogeneity in latent variable modeling into account to capture subgroups with both good model fit and stable parameter estimates. We apply LV Forest to simulated data with heterogeneous model parameters as well as to real large-scale survey data. We show that LV Forest improves the accuracy of score estimation if parameter heterogeneity is present.","PeriodicalId":11502,"journal":{"name":"Educational and Psychological Measurement","volume":"20 1","pages":""},"PeriodicalIF":2.7,"publicationDate":"2024-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140581666","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Fused SDT/IRT Models for Mixed-Format Exams 混合格式考试的融合 SDT/IRT 模型
IF 2.7 3区 心理学 Q2 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS Pub Date : 2024-03-28 DOI: 10.1177/00131644241235333
Lawrence T. DeCarlo
A psychological framework for different types of items commonly used with mixed-format exams is proposed. A choice model based on signal detection theory (SDT) is used for multiple-choice (MC) items, whereas an item response theory (IRT) model is used for open-ended (OE) items. The SDT and IRT models are shown to share a common conceptualization in terms of latent states of “know/don’t know” at the examinee level. This in turn suggests a way to join or “fuse” the models—through the probability of knowing. A general model that fuses the SDT choice model, for MC items, with a generalized sequential logit model, for OE items, is introduced. Fitting SDT and IRT models simultaneously allows one to examine possible differences in psychological processes across the different types of items, to examine the effects of covariates in both models simultaneously, to allow for relations among the model parameters, and likely offers potential estimation benefits. The utility of the approach is illustrated with MC and OE items from large-scale international exams.
针对混合形式考试中常用的不同类型的题目,提出了一个心理学框架。基于信号检测理论(SDT)的选择模型适用于多项选择(MC)题目,而项目反应理论(IRT)模型则适用于开放式(OE)题目。结果表明,SDT 模型和 IRT 模型在被试者水平上的 "知道/不知道 "潜在状态方面具有共同的概念。这反过来又提出了一种通过 "知道 "的概率来连接或 "融合 "这两种模型的方法。本文介绍了一个通用模型,该模型融合了针对 MC 项目的 SDT 选择模型和针对 OE 项目的广义顺序 logit 模型。同时拟合 SDT 模型和 IRT 模型,可以考察不同类型项目的心理过程可能存在的差异,同时考察两个模型中协变量的影响,考虑模型参数之间的关系,并可能带来潜在的估算优势。我们用大型国际考试中的 MC 和 OE 项目来说明这种方法的实用性。
{"title":"Fused SDT/IRT Models for Mixed-Format Exams","authors":"Lawrence T. DeCarlo","doi":"10.1177/00131644241235333","DOIUrl":"https://doi.org/10.1177/00131644241235333","url":null,"abstract":"A psychological framework for different types of items commonly used with mixed-format exams is proposed. A choice model based on signal detection theory (SDT) is used for multiple-choice (MC) items, whereas an item response theory (IRT) model is used for open-ended (OE) items. The SDT and IRT models are shown to share a common conceptualization in terms of latent states of “know/don’t know” at the examinee level. This in turn suggests a way to join or “fuse” the models—through the probability of knowing. A general model that fuses the SDT choice model, for MC items, with a generalized sequential logit model, for OE items, is introduced. Fitting SDT and IRT models simultaneously allows one to examine possible differences in psychological processes across the different types of items, to examine the effects of covariates in both models simultaneously, to allow for relations among the model parameters, and likely offers potential estimation benefits. The utility of the approach is illustrated with MC and OE items from large-scale international exams.","PeriodicalId":11502,"journal":{"name":"Educational and Psychological Measurement","volume":"40 1","pages":""},"PeriodicalIF":2.7,"publicationDate":"2024-03-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140322190","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Examining the Dynamic of Clustering Effects in Multilevel Designs: A Latent Variable Method Application 考察多层次设计中聚类效应的动态:潜变量法的应用
IF 2.7 3区 心理学 Q2 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS Pub Date : 2024-02-21 DOI: 10.1177/00131644241228602
Tenko Raykov, Ahmed Haddadi, Christine DiStefano, Mohammed Alqabbaa
This note is concerned with the study of temporal development in several indices reflecting clustering effects in multilevel designs that are frequently utilized in educational and behavioral research. A latent variable method-based approach is outlined, which can be used to point and interval estimate the growth or decline in important functions of level-specific variances in two-level and three-level settings. The procedure may also be employed for the purpose of examining stability over time in clustering effects. The method can be utilized with widely circulated latent variable modeling software, and is illustrated using empirical examples.
本说明主要研究在教育和行为研究中经常使用的多层次设计中反映聚类效应的几个指数的时间发展。本文概述了一种基于潜变量方法的方法,该方法可用于在两级和三级设置中对特定水平方差的重要函数的增长或下降进行点和区间估计。该方法还可用于研究聚类效应随时间变化的稳定性。该方法可与广泛使用的潜在变量建模软件结合使用,并通过经验实例加以说明。
{"title":"Examining the Dynamic of Clustering Effects in Multilevel Designs: A Latent Variable Method Application","authors":"Tenko Raykov, Ahmed Haddadi, Christine DiStefano, Mohammed Alqabbaa","doi":"10.1177/00131644241228602","DOIUrl":"https://doi.org/10.1177/00131644241228602","url":null,"abstract":"This note is concerned with the study of temporal development in several indices reflecting clustering effects in multilevel designs that are frequently utilized in educational and behavioral research. A latent variable method-based approach is outlined, which can be used to point and interval estimate the growth or decline in important functions of level-specific variances in two-level and three-level settings. The procedure may also be employed for the purpose of examining stability over time in clustering effects. The method can be utilized with widely circulated latent variable modeling software, and is illustrated using empirical examples.","PeriodicalId":11502,"journal":{"name":"Educational and Psychological Measurement","volume":"14 1","pages":""},"PeriodicalIF":2.7,"publicationDate":"2024-02-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139954004","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Correcting for Extreme Response Style: Model Choice Matters. 纠正极端反应风格:模型选择问题
IF 2.7 3区 心理学 Q2 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS Pub Date : 2024-02-01 Epub Date: 2023-02-17 DOI: 10.1177/00131644231155838
Martijn Schoenmakers, Jesper Tijmstra, Jeroen Vermunt, Maria Bolsinova

Extreme response style (ERS), the tendency of participants to select extreme item categories regardless of the item content, has frequently been found to decrease the validity of Likert-type questionnaire results. For this reason, various item response theory (IRT) models have been proposed to model ERS and correct for it. Comparisons of these models are however rare in the literature, especially in the context of cross-cultural comparisons, where ERS is even more relevant due to cultural differences between groups. To remedy this issue, the current article examines two frequently used IRT models that can be estimated using standard software: a multidimensional nominal response model (MNRM) and a IRTree model. Studying conceptual differences between these models reveals that they differ substantially in their conceptualization of ERS. These differences result in different category probabilities between the models. To evaluate the impact of these differences in a multigroup context, a simulation study is conducted. Our results show that when the groups differ in their average ERS, the IRTree model and MNRM can drastically differ in their conclusions about the size and presence of differences in the substantive trait between these groups. An empirical example is given and implications for the future use of both models and the conceptualization of ERS are discussed.

极端反应风格(Extreme response style, ERS),即参与者不考虑项目内容而选择极端项目类别的倾向,经常被发现会降低李克特型问卷结果的效度。因此,人们提出了各种项目反应理论(IRT)模型来对ERS进行建模和修正。然而,这些模型的比较在文献中很少,特别是在跨文化比较的背景下,由于群体之间的文化差异,ERS更加相关。为了解决这个问题,本文研究了两种常用的IRT模型,它们可以使用标准软件进行估计:多维标称响应模型(MNRM)和IRTree模型。研究这些模型之间的概念差异表明,它们对ERS的概念化存在很大差异。这些差异导致模型之间的类别概率不同。为了评估这些差异在多群体环境中的影响,进行了模拟研究。我们的研究结果表明,当两组的平均ERS不同时,IRTree模型和MNRM可以在这些组之间实质性性状差异的大小和存在性方面得出截然不同的结论。给出了一个经验例子,并讨论了未来使用这两个模型和ERS概念化的含义。
{"title":"Correcting for Extreme Response Style: Model Choice Matters.","authors":"Martijn Schoenmakers, Jesper Tijmstra, Jeroen Vermunt, Maria Bolsinova","doi":"10.1177/00131644231155838","DOIUrl":"10.1177/00131644231155838","url":null,"abstract":"<p><p>Extreme response style (ERS), the tendency of participants to select extreme item categories regardless of the item content, has frequently been found to decrease the validity of Likert-type questionnaire results. For this reason, various item response theory (IRT) models have been proposed to model ERS and correct for it. Comparisons of these models are however rare in the literature, especially in the context of cross-cultural comparisons, where ERS is even more relevant due to cultural differences between groups. To remedy this issue, the current article examines two frequently used IRT models that can be estimated using standard software: a multidimensional nominal response model (MNRM) and a IRTree model. Studying conceptual differences between these models reveals that they differ substantially in their conceptualization of ERS. These differences result in different category probabilities between the models. To evaluate the impact of these differences in a multigroup context, a simulation study is conducted. Our results show that when the groups differ in their average ERS, the IRTree model and MNRM can drastically differ in their conclusions about the size and presence of differences in the substantive trait between these groups. An empirical example is given and implications for the future use of both models and the conceptualization of ERS are discussed.</p>","PeriodicalId":11502,"journal":{"name":"Educational and Psychological Measurement","volume":"1 1","pages":"145-170"},"PeriodicalIF":2.7,"publicationDate":"2024-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10795569/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41386423","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Two-Method Measurement Planned Missing Data With Purposefully Selected Samples 使用特选样本的双方法测量计划缺失数据
IF 2.7 3区 心理学 Q2 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS Pub Date : 2024-01-05 DOI: 10.1177/00131644231222603
M. Xu, Jessica A. R. Logan
Research designs that include planned missing data are gaining popularity in applied education research. These methods have traditionally relied on introducing missingness into data collections using the missing completely at random (MCAR) mechanism. This study assesses whether planned missingness can also be implemented when data are instead designed to be purposefully missing based on student performance. A research design with purposefully selected missingness would allow researchers to focus all assessment efforts on a target sample, while still maintaining the statistical power of the full sample. This study introduces the method and demonstrates the performance of the purposeful missingness method within the two-method measurement planned missingness design using a Monte Carlo simulation study. Results demonstrate that the purposeful missingness method can recover parameter estimates in models with as much accuracy as the MCAR method, across multiple conditions.
在应用教育研究中,包含计划缺失数据的研究设计越来越受欢迎。这些方法传统上依赖于使用完全随机缺失(MCAR)机制在数据收集中引入缺失。本研究评估的是,当数据被设计为基于学生成绩的有目的缺失时,是否也可以实施有计划的缺失。有目的性地选择缺失的研究设计可以让研究人员将所有评估工作集中在目标样本上,同时仍能保持全样本的统计能力。本研究介绍了这一方法,并通过蒙特卡罗模拟研究证明了有目的遗漏法在双方法测量计划遗漏设计中的性能。结果表明,有目的的遗漏法可以在多种条件下恢复模型中的参数估计值,其准确性不亚于 MCAR 方法。
{"title":"Two-Method Measurement Planned Missing Data With Purposefully Selected Samples","authors":"M. Xu, Jessica A. R. Logan","doi":"10.1177/00131644231222603","DOIUrl":"https://doi.org/10.1177/00131644231222603","url":null,"abstract":"Research designs that include planned missing data are gaining popularity in applied education research. These methods have traditionally relied on introducing missingness into data collections using the missing completely at random (MCAR) mechanism. This study assesses whether planned missingness can also be implemented when data are instead designed to be purposefully missing based on student performance. A research design with purposefully selected missingness would allow researchers to focus all assessment efforts on a target sample, while still maintaining the statistical power of the full sample. This study introduces the method and demonstrates the performance of the purposeful missingness method within the two-method measurement planned missingness design using a Monte Carlo simulation study. Results demonstrate that the purposeful missingness method can recover parameter estimates in models with as much accuracy as the MCAR method, across multiple conditions.","PeriodicalId":11502,"journal":{"name":"Educational and Psychological Measurement","volume":"29 47","pages":""},"PeriodicalIF":2.7,"publicationDate":"2024-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139382541","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Educational and Psychological Measurement
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1