首页 > 最新文献

Educational and Psychological Measurement最新文献

英文 中文
Is Effort Moderated Scoring Robust to Multidimensional Rapid Guessing? 努力程度调节评分对多维快速猜测是否稳健?
IF 2.7 3区 心理学 Q1 Social Sciences Pub Date : 2024-04-28 DOI: 10.1177/00131644241246749
Joseph A. Rios, Jiayi Deng
To mitigate the potential damaging consequences of rapid guessing (RG), a form of noneffortful responding, researchers have proposed a number of scoring approaches. The present simulation study examines the robustness of the most popular of these approaches, the unidimensional effort-moderated (EM) scoring procedure, to multidimensional RG (i.e., RG that is linearly related to examinee ability). Specifically, EM scoring is compared with the Holman–Glas (HG) method, a multidimensional scoring approach, in terms of model fit distortion, ability parameter recovery, and omega reliability distortion. Test difficulty, the proportion of RG present within a sample, and the strength of association between ability and RG propensity were manipulated to create 80 total conditions. Overall, the results showed that EM scoring provided improved model fit compared with HG scoring when RG comprised 12% or less of all item responses. Furthermore, no significant differences in ability parameter recovery and omega reliability distortion were noted when comparing these two scoring approaches under moderate degrees of RG multidimensionality. These limited differences were largely due to the limited impact of RG on aggregated ability (bias ranged from 0.00 to 0.05 logits) and reliability (distortion was ≤ .005 units) estimates when as much as 40% of item responses in the sample data reflected RG behavior.
快速猜测(RG)是一种非努力反应形式,为了减轻快速猜测的潜在破坏性后果,研究人员提出了许多评分方法。本模拟研究考察了这些方法中最流行的单维努力调解(EM)计分程序对多维 RG(即与考生能力呈线性关系的 RG)的稳健性。具体来说,EM 计分与 Holman-Glas(HG)方法(一种多维计分方法)在模型拟合失真、能力参数恢复和欧米茄信度失真方面进行了比较。测试难度、样本中出现 RG 的比例以及能力与 RG 倾向之间的关联强度受到操纵,共产生了 80 种条件。总体而言,研究结果表明,当 RG 占所有项目回答的 12% 或更少时,与 HG 评分相比,EM 评分的模型拟合度更高。此外,在中等程度的 RG 多维性条件下,比较这两种计分方法,在能力参数恢复和欧米茄信度失真方面没有发现明显差异。这些有限的差异主要是由于当样本数据中有多达 40% 的项目回答反映了 RG 行为时,RG 对综合能力(偏差范围在 0.00 至 0.05 logits 之间)和可靠性(失真度小于 0.005 单位)估计值的影响有限。
{"title":"Is Effort Moderated Scoring Robust to Multidimensional Rapid Guessing?","authors":"Joseph A. Rios, Jiayi Deng","doi":"10.1177/00131644241246749","DOIUrl":"https://doi.org/10.1177/00131644241246749","url":null,"abstract":"To mitigate the potential damaging consequences of rapid guessing (RG), a form of noneffortful responding, researchers have proposed a number of scoring approaches. The present simulation study examines the robustness of the most popular of these approaches, the unidimensional effort-moderated (EM) scoring procedure, to multidimensional RG (i.e., RG that is linearly related to examinee ability). Specifically, EM scoring is compared with the Holman–Glas (HG) method, a multidimensional scoring approach, in terms of model fit distortion, ability parameter recovery, and omega reliability distortion. Test difficulty, the proportion of RG present within a sample, and the strength of association between ability and RG propensity were manipulated to create 80 total conditions. Overall, the results showed that EM scoring provided improved model fit compared with HG scoring when RG comprised 12% or less of all item responses. Furthermore, no significant differences in ability parameter recovery and omega reliability distortion were noted when comparing these two scoring approaches under moderate degrees of RG multidimensionality. These limited differences were largely due to the limited impact of RG on aggregated ability (bias ranged from 0.00 to 0.05 logits) and reliability (distortion was ≤ .005 units) estimates when as much as 40% of item responses in the sample data reflected RG behavior.","PeriodicalId":11502,"journal":{"name":"Educational and Psychological Measurement","volume":null,"pages":null},"PeriodicalIF":2.7,"publicationDate":"2024-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140810537","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Comparing Accuracy of Parallel Analysis and Fit Statistics for Estimating the Number of Factors With Ordered Categorical Data in Exploratory Factor Analysis 比较平行分析和拟合统计在探索性因子分析中估计有序分类数据的因子数的准确性
IF 2.7 3区 心理学 Q1 Social Sciences Pub Date : 2024-04-17 DOI: 10.1177/00131644241240435
Hyunjung Lee, Heining Cham
Determining the number of factors in exploratory factor analysis (EFA) is crucial because it affects the rest of the analysis and the conclusions of the study. Researchers have developed various methods for deciding the number of factors to retain in EFA, but this remains one of the most difficult decisions in the EFA. The purpose of this study is to compare the parallel analysis with the performance of fit indices that researchers have started using as another strategy for determining the optimal number of factors in EFA. The Monte Carlo simulation was conducted with ordered categorical items because there are mixed results in previous simulation studies, and ordered categorical items are common in behavioral science. The results of this study indicate that the parallel analysis and the root mean square error of approximation (RMSEA) performed well in most conditions, followed by the Tucker–Lewis index (TLI) and then by the comparative fit index (CFI). The robust corrections of CFI, TLI, and RMSEA performed better in detecting misfit underfactored models than the original fit indices. However, they did not produce satisfactory results in dichotomous data with a small sample size. Implications, limitations of this study, and future research directions are discussed.
在探索性因素分析(EFA)中确定因素的数量至关重要,因为它影响到分析的其余部分和研究的结论。研究人员开发了各种方法来决定 EFA 中应保留的因子数量,但这仍然是 EFA 中最难做出的决定之一。本研究的目的是比较平行分析与拟合指数的性能,研究人员已开始将拟合指数作为确定 EFA 中最佳因子数的另一种策略。蒙特卡洛模拟采用了有序分类项目,因为以往的模拟研究结果不一,而且有序分类项目在行为科学中很常见。研究结果表明,平行分析和均方根近似误差(RMSEA)在大多数情况下表现良好,其次是塔克-刘易斯指数(TLI),然后是比较拟合指数(CFI)。与原始拟合指数相比,CFI、TLI 和 RMSEA 的稳健修正在检测误拟合模型方面表现更好。然而,在样本量较小的二分数据中,它们并没有产生令人满意的结果。本文讨论了本研究的意义、局限性和未来的研究方向。
{"title":"Comparing Accuracy of Parallel Analysis and Fit Statistics for Estimating the Number of Factors With Ordered Categorical Data in Exploratory Factor Analysis","authors":"Hyunjung Lee, Heining Cham","doi":"10.1177/00131644241240435","DOIUrl":"https://doi.org/10.1177/00131644241240435","url":null,"abstract":"Determining the number of factors in exploratory factor analysis (EFA) is crucial because it affects the rest of the analysis and the conclusions of the study. Researchers have developed various methods for deciding the number of factors to retain in EFA, but this remains one of the most difficult decisions in the EFA. The purpose of this study is to compare the parallel analysis with the performance of fit indices that researchers have started using as another strategy for determining the optimal number of factors in EFA. The Monte Carlo simulation was conducted with ordered categorical items because there are mixed results in previous simulation studies, and ordered categorical items are common in behavioral science. The results of this study indicate that the parallel analysis and the root mean square error of approximation (RMSEA) performed well in most conditions, followed by the Tucker–Lewis index (TLI) and then by the comparative fit index (CFI). The robust corrections of CFI, TLI, and RMSEA performed better in detecting misfit underfactored models than the original fit indices. However, they did not produce satisfactory results in dichotomous data with a small sample size. Implications, limitations of this study, and future research directions are discussed.","PeriodicalId":11502,"journal":{"name":"Educational and Psychological Measurement","volume":null,"pages":null},"PeriodicalIF":2.7,"publicationDate":"2024-04-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140616292","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Exploring the Influence of Response Styles on Continuous Scale Assessments: Insights From a Novel Modeling Approach 探索连续量表评估中反应风格的影响:新颖建模方法的启示
IF 2.7 3区 心理学 Q1 Social Sciences Pub Date : 2024-04-17 DOI: 10.1177/00131644241242789
Hung-Yu Huang
The use of discrete categorical formats to assess psychological traits has a long-standing tradition that is deeply embedded in item response theory models. The increasing prevalence and endorsement of computer- or web-based testing has led to greater focus on continuous response formats, which offer numerous advantages in both respondent experience and methodological considerations. Response styles, which are frequently observed in self-reported data, reflect a propensity to answer questionnaire items in a consistent manner, regardless of the item content. These response styles have been identified as causes of skewed scale scores and biased trait inferences. In this study, we investigate the impact of response styles on individuals’ responses within a continuous scale context, with a specific emphasis on extreme response style (ERS) and acquiescence response style (ARS). Building upon the established continuous response model (CRM), we propose extensions known as the CRM-ERS and CRM-ARS. These extensions are employed to quantitatively capture individual variations in these distinct response styles. The effectiveness of the proposed models was evaluated through a series of simulation studies. Bayesian methods were employed to effectively calibrate the model parameters. The results demonstrate that both models achieve satisfactory parameter recovery. Neglecting the effects of response styles led to biased estimation, underscoring the importance of accounting for these effects. Moreover, the estimation accuracy improved with increasing test length and sample size. An empirical analysis is presented to elucidate the practical applications and implications of the proposed models.
使用离散的分类形式来评估心理特征有着悠久的传统,这种传统深深植根于项目反应理论模型之中。随着计算机或网络测试的日益普及和认可,人们开始更多地关注连续反应形式,因为连续反应形式在被调查者体验和方法学考虑方面都有很多优势。在自我报告数据中经常出现的应答方式,反映了一种倾向,即无论项目内容如何,都以一致的方式回答问卷项目。这些回答方式被认为是造成量表评分偏差和特质推断偏差的原因。在本研究中,我们调查了在连续量表情境下,反应风格对个人反应的影响,并特别强调了极端反应风格(ERS)和默许反应风格(ARS)。在已建立的连续反应模型(CRM)的基础上,我们提出了称为 CRM-ERS 和 CRM-ARS 的扩展模型。这些扩展用于定量捕捉这些不同反应风格的个体差异。我们通过一系列模拟研究评估了所建议模型的有效性。采用贝叶斯方法对模型参数进行了有效校准。结果表明,这两个模型都实现了令人满意的参数恢复。忽略反应风格的影响会导致估计偏差,这突出了考虑这些影响的重要性。此外,随着测试时间和样本量的增加,估计精度也有所提高。本文通过实证分析阐明了所提模型的实际应用和意义。
{"title":"Exploring the Influence of Response Styles on Continuous Scale Assessments: Insights From a Novel Modeling Approach","authors":"Hung-Yu Huang","doi":"10.1177/00131644241242789","DOIUrl":"https://doi.org/10.1177/00131644241242789","url":null,"abstract":"The use of discrete categorical formats to assess psychological traits has a long-standing tradition that is deeply embedded in item response theory models. The increasing prevalence and endorsement of computer- or web-based testing has led to greater focus on continuous response formats, which offer numerous advantages in both respondent experience and methodological considerations. Response styles, which are frequently observed in self-reported data, reflect a propensity to answer questionnaire items in a consistent manner, regardless of the item content. These response styles have been identified as causes of skewed scale scores and biased trait inferences. In this study, we investigate the impact of response styles on individuals’ responses within a continuous scale context, with a specific emphasis on extreme response style (ERS) and acquiescence response style (ARS). Building upon the established continuous response model (CRM), we propose extensions known as the CRM-ERS and CRM-ARS. These extensions are employed to quantitatively capture individual variations in these distinct response styles. The effectiveness of the proposed models was evaluated through a series of simulation studies. Bayesian methods were employed to effectively calibrate the model parameters. The results demonstrate that both models achieve satisfactory parameter recovery. Neglecting the effects of response styles led to biased estimation, underscoring the importance of accounting for these effects. Moreover, the estimation accuracy improved with increasing test length and sample size. An empirical analysis is presented to elucidate the practical applications and implications of the proposed models.","PeriodicalId":11502,"journal":{"name":"Educational and Psychological Measurement","volume":null,"pages":null},"PeriodicalIF":2.7,"publicationDate":"2024-04-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140617770","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The Impact of Insufficient Effort Responses on the Order of Category Thresholds in the Polytomous Rasch Model 多变量 Rasch 模型中努力不足的反应对类别阈值顺序的影响
IF 2.7 3区 心理学 Q1 Social Sciences Pub Date : 2024-04-13 DOI: 10.1177/00131644241242806
Kuan-Yu Jin, Thomas Eckes
Insufficient effort responding (IER) refers to a lack of effort when answering survey or questionnaire items. Such items typically offer more than two ordered response categories, with Likert-type scales as the most prominent example. The underlying assumption is that the successive categories reflect increasing levels of the latent variable assessed. This research investigates how IER affects the intended category order of Likert-type scales, focusing on the category thresholds in the polytomous Rasch model. In a simulation study, we examined several IER patterns in datasets generated from the mixture model for IER (MMIER). The key findings were (a) random responding and overusing the non-extreme categories of a five-category scale were each associated with high frequencies of disordered category thresholds; (b) raising the IER rate from 5% to 10% led to a substantial increase in threshold disordering, particularly among easy and difficult items; (c) narrow distances between adjacent categories (0.5 logits) were associated with more frequent disordering, compared with wide distances (1.0 logits). Two real-data examples highlighted the efficiency and utility of the MMIER for detecting latent classes of respondents exhibiting different forms of IER. Under the MMIER, the frequency of disordered thresholds was reduced substantially in both examples. The discussion focuses on the practical implications of using the MMIER in survey research and points to directions for future research.
不充分努力回答 (IER) 是指在回答调查或问卷项目时不努力。这类项目通常提供两个以上的有序回答类别,最突出的例子就是李克特量表。其基本假设是,连续的类别反映了所评估的潜在变量的递增水平。本研究调查了 IER 如何影响李克特量表的预期类别顺序,重点是多态 Rasch 模型中的类别阈值。在一项模拟研究中,我们考察了由 IER 混合模型(MMIER)生成的数据集中的几种 IER 模式。主要发现有:(a) 随机应答和过度使用五类量表中的非极端类别都与类别阈值失调的高频率有关;(b) 将 IER 率从 5% 提高到 10% 会导致阈值失调的大幅增加,尤其是在容易和困难的项目中;(c) 相邻类别之间的距离窄(0.5 logits)与距离宽(1.0 logits)相比与更频繁的失调有关。两个真实数据实例凸显了多因素误差分析法在检测表现出不同形式 IER 的潜在受访者类别方面的效率和实用性。在 MMIER 的作用下,两个例子中无序阈值的频率都大幅降低。讨论的重点是在调查研究中使用 MMIER 的实际意义,并指出了未来研究的方向。
{"title":"The Impact of Insufficient Effort Responses on the Order of Category Thresholds in the Polytomous Rasch Model","authors":"Kuan-Yu Jin, Thomas Eckes","doi":"10.1177/00131644241242806","DOIUrl":"https://doi.org/10.1177/00131644241242806","url":null,"abstract":"Insufficient effort responding (IER) refers to a lack of effort when answering survey or questionnaire items. Such items typically offer more than two ordered response categories, with Likert-type scales as the most prominent example. The underlying assumption is that the successive categories reflect increasing levels of the latent variable assessed. This research investigates how IER affects the intended category order of Likert-type scales, focusing on the category thresholds in the polytomous Rasch model. In a simulation study, we examined several IER patterns in datasets generated from the mixture model for IER (MMIER). The key findings were (a) random responding and overusing the non-extreme categories of a five-category scale were each associated with high frequencies of disordered category thresholds; (b) raising the IER rate from 5% to 10% led to a substantial increase in threshold disordering, particularly among easy and difficult items; (c) narrow distances between adjacent categories (0.5 logits) were associated with more frequent disordering, compared with wide distances (1.0 logits). Two real-data examples highlighted the efficiency and utility of the MMIER for detecting latent classes of respondents exhibiting different forms of IER. Under the MMIER, the frequency of disordered thresholds was reduced substantially in both examples. The discussion focuses on the practical implications of using the MMIER in survey research and points to directions for future research.","PeriodicalId":11502,"journal":{"name":"Educational and Psychological Measurement","volume":null,"pages":null},"PeriodicalIF":2.7,"publicationDate":"2024-04-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140581547","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Modeling Misspecification as a Parameter in Bayesian Structural Equation Models. 贝叶斯结构方程模型中作为参数的建模错误
IF 2.1 3区 心理学 Q1 Social Sciences Pub Date : 2024-04-01 Epub Date: 2023-04-24 DOI: 10.1177/00131644231165306
James Ohisei Uanhoro

Accounting for model misspecification in Bayesian structural equation models is an active area of research. We present a uniquely Bayesian approach to misspecification that models the degree of misspecification as a parameter-a parameter akin to the correlation root mean squared residual. The misspecification parameter can be interpreted on its own terms as a measure of absolute model fit and allows for comparing different models fit to the same data. By estimating the degree of misspecification simultaneously with structural parameters, the uncertainty about structural parameters reflects the degree of model misspecification. This results in a model that produces more reliable inference than extant Bayesian structural equation modeling. In addition, the approach estimates the residual covariance matrix that can be the basis for diagnosing misspecifications and updating a hypothesized model. These features are confirmed using simulation studies. Demonstrations with a variety of real-world examples show additional properties of the approach.

解释贝叶斯结构方程模型中的模型错误是一个活跃的研究领域。我们提出了一种独特的错误指定贝叶斯方法,将错误指定的程度建模为一个参数——一个类似于相关均方根残差的参数。错误指定参数可以根据其自身的条件被解释为绝对模型拟合的度量,并允许比较适合于相同数据的不同模型。通过与结构参数同时估计错误指定的程度,结构参数的不确定性反映了模型的错误指定程度。这导致了一个比现存的贝叶斯结构方程模型产生更可靠推断的模型。此外,该方法估计残差协方差矩阵,该矩阵可以作为诊断错误规范和更新假设模型的基础。这些特征通过模拟研究得到了证实。通过各种真实世界的例子进行演示,展示了该方法的其他特性。
{"title":"Modeling Misspecification as a Parameter in Bayesian Structural Equation Models.","authors":"James Ohisei Uanhoro","doi":"10.1177/00131644231165306","DOIUrl":"10.1177/00131644231165306","url":null,"abstract":"<p><p>Accounting for model misspecification in Bayesian structural equation models is an active area of research. We present a uniquely Bayesian approach to misspecification that models the degree of misspecification as a parameter-a parameter akin to the correlation root mean squared residual. The misspecification parameter can be interpreted on its own terms as a measure of absolute model fit and allows for comparing different models fit to the same data. By estimating the degree of misspecification simultaneously with structural parameters, the uncertainty about structural parameters reflects the degree of model misspecification. This results in a model that produces more reliable inference than extant Bayesian structural equation modeling. In addition, the approach estimates the residual covariance matrix that can be the basis for diagnosing misspecifications and updating a hypothesized model. These features are confirmed using simulation studies. Demonstrations with a variety of real-world examples show additional properties of the approach.</p>","PeriodicalId":11502,"journal":{"name":"Educational and Psychological Measurement","volume":null,"pages":null},"PeriodicalIF":2.1,"publicationDate":"2024-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11185103/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41403906","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Identifying Disengaged Responding in Multiple-Choice Items: Extending a Latent Class Item Response Model With Novel Process Data Indicators. 识别多选项目中的脱离响应:用新的过程数据指标扩展潜在类项目响应模型
IF 2.1 3区 心理学 Q1 Social Sciences Pub Date : 2024-04-01 Epub Date: 2023-04-29 DOI: 10.1177/00131644231169211
Jana Welling, Timo Gnambs, Claus H Carstensen

Disengaged responding poses a severe threat to the validity of educational large-scale assessments, because item responses from unmotivated test-takers do not reflect their actual ability. Existing identification approaches rely primarily on item response times, which bears the risk of misclassifying fast engaged or slow disengaged responses. Process data with its rich pool of additional information on the test-taking process could thus be used to improve existing identification approaches. In this study, three process data variables-text reread, item revisit, and answer change-were introduced as potential indicators of response engagement for multiple-choice items in a reading comprehension test. An extended latent class item response model for disengaged responding was developed by including the three new indicators as additional predictors of response engagement. In a sample of 1,932 German university students, the extended model indicated a better model fit than the baseline model, which included item response time as only indicator of response engagement. In the extended model, both item response time and text reread were significant predictors of response engagement. However, graphical analyses revealed no systematic differences in the item and person parameter estimation or item response classification between the models. These results suggest only a marginal improvement of the identification of disengaged responding by the new indicators. Implications of these results for future research on disengaged responding with process data are discussed.

脱离接触的回答对教育大规模评估的有效性构成了严重威胁,因为没有动机的考生的项目回答并不能反映他们的实际能力。现有的识别方法主要依赖于项目响应时间,这会带来对快速响应或慢速响应进行错误分类的风险。因此,过程数据及其关于考试过程的丰富附加信息库可用于改进现有的识别方法。在这项研究中,引入了三个过程数据变量——文本重读、项目重访和答案变化——作为阅读理解测试中多项选择题反应参与的潜在指标。通过将三个新指标作为反应参与的额外预测因素,开发了一个用于脱离反应的扩展潜在类别项目反应模型。在1932名德国大学生的样本中,扩展模型显示出比基线模型更好的模型拟合度,基线模型将项目反应时间作为反应参与度的唯一指标。在扩展模型中,项目反应时间和文本重读都是反应参与的重要预测因素。然而,图形分析显示,模型之间在项目和个人参数估计或项目反应分类方面没有系统性差异。这些结果表明,新指标对脱离反应的识别仅略有改善。讨论了这些结果对未来研究过程数据脱离响应的影响。
{"title":"Identifying Disengaged Responding in Multiple-Choice Items: Extending a Latent Class Item Response Model With Novel Process Data Indicators.","authors":"Jana Welling, Timo Gnambs, Claus H Carstensen","doi":"10.1177/00131644231169211","DOIUrl":"10.1177/00131644231169211","url":null,"abstract":"<p><p>Disengaged responding poses a severe threat to the validity of educational large-scale assessments, because item responses from unmotivated test-takers do not reflect their actual ability. Existing identification approaches rely primarily on item response times, which bears the risk of misclassifying fast engaged or slow disengaged responses. Process data with its rich pool of additional information on the test-taking process could thus be used to improve existing identification approaches. In this study, three process data variables-text reread, item revisit, and answer change-were introduced as potential indicators of response engagement for multiple-choice items in a reading comprehension test. An extended latent class item response model for disengaged responding was developed by including the three new indicators as additional predictors of response engagement. In a sample of 1,932 German university students, the extended model indicated a better model fit than the baseline model, which included item response time as only indicator of response engagement. In the extended model, both item response time and text reread were significant predictors of response engagement. However, graphical analyses revealed no systematic differences in the item and person parameter estimation or item response classification between the models. These results suggest only a marginal improvement of the identification of disengaged responding by the new indicators. Implications of these results for future research on disengaged responding with process data are discussed.</p>","PeriodicalId":11502,"journal":{"name":"Educational and Psychological Measurement","volume":null,"pages":null},"PeriodicalIF":2.1,"publicationDate":"2024-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11185098/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49022250","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Iterative Item Selection of Neighborhood Clusters: A Nonparametric and Non-IRT Method for Generating Miniature Computer Adaptive Questionnaires. 邻域聚类的迭代项目选择:一种生成小型计算机自适应问卷的非参数非IRT方法
IF 2.1 3区 心理学 Q1 Social Sciences Pub Date : 2024-04-01 Epub Date: 2023-06-06 DOI: 10.1177/00131644231176053
Yongze Xu

The questionnaire method has always been an important research method in psychology. The increasing prevalence of multidimensional trait measures in psychological research has led researchers to use longer questionnaires. However, questionnaires that are too long will inevitably reduce the quality of the completed questionnaires and the efficiency of collection. Computer adaptive testing (CAT) can be used to reduce the test length while preserving the measurement accuracy. However, it is more often used in aptitude testing and involves a large number of parametric assumptions. Applying CAT to psychological questionnaires often requires question-specific model design and preexperimentation. The present article proposes a nonparametric and item response theory (IRT)-independent CAT algorithm. The new algorithm is simple and highly generalizable. It can be quickly used in a variety of questionnaires and tests without being limited by theoretical assumptions in different research areas. Simulation and empirical studies were conducted to demonstrate the validity of the new algorithm in aptitude tests and personality measures.

问卷调查法一直是心理学研究的一种重要方法。多维特质测量在心理学研究中越来越普遍,这促使研究人员使用更长的问卷。然而,过长的问卷不可避免地会降低已完成问卷的质量和收集效率。计算机自适应测试(CAT)可以用来减少测试长度,同时保持测量精度。然而,它更经常用于能力测试,并涉及大量的参数假设。将CAT应用于心理问卷通常需要特定问题的模型设计和预先实验。本文提出了一种非参数和项目反应理论无关的CAT算法。新算法简单,可推广性强。它可以快速用于各种问卷调查和测试,而不受不同研究领域理论假设的限制。通过模拟和实证研究,验证了新算法在能力测试和人格测量中的有效性。
{"title":"Iterative Item Selection of Neighborhood Clusters: A Nonparametric and Non-IRT Method for Generating Miniature Computer Adaptive Questionnaires.","authors":"Yongze Xu","doi":"10.1177/00131644231176053","DOIUrl":"10.1177/00131644231176053","url":null,"abstract":"<p><p>The questionnaire method has always been an important research method in psychology. The increasing prevalence of multidimensional trait measures in psychological research has led researchers to use longer questionnaires. However, questionnaires that are too long will inevitably reduce the quality of the completed questionnaires and the efficiency of collection. Computer adaptive testing (CAT) can be used to reduce the test length while preserving the measurement accuracy. However, it is more often used in aptitude testing and involves a large number of parametric assumptions. Applying CAT to psychological questionnaires often requires question-specific model design and preexperimentation. The present article proposes a nonparametric and item response theory (IRT)-independent CAT algorithm. The new algorithm is simple and highly generalizable. It can be quickly used in a variety of questionnaires and tests without being limited by theoretical assumptions in different research areas. Simulation and empirical studies were conducted to demonstrate the validity of the new algorithm in aptitude tests and personality measures.</p>","PeriodicalId":11502,"journal":{"name":"Educational and Psychological Measurement","volume":null,"pages":null},"PeriodicalIF":2.1,"publicationDate":"2024-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11185101/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46470838","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Comparison of Response Time Threshold Scoring Procedures in Mitigating Bias From Rapid Guessing Behavior. 反应时间阈值评分方法在减轻快速猜测行为偏差中的比较
IF 2.1 3区 心理学 Q1 Social Sciences Pub Date : 2024-04-01 Epub Date: 2023-04-26 DOI: 10.1177/00131644231168398
Joseph A Rios, Jiayi Deng

Rapid guessing (RG) is a form of non-effortful responding that is characterized by short response latencies. This construct-irrelevant behavior has been shown in previous research to bias inferences concerning measurement properties and scores. To mitigate these deleterious effects, a number of response time threshold scoring procedures have been proposed, which recode RG responses (e.g., treat them as incorrect or missing, or impute probable values) and then estimate parameters for the recoded dataset using a unidimensional or multidimensional IRT model. To date, there have been limited attempts to compare these methods under the possibility that RG may be misclassified in practice. To address this shortcoming, the present simulation study compared item and ability parameter recovery for four scoring procedures by manipulating sample size, the linear relationship between RG propensity and ability, the percentage of RG responses, and the type and rate of RG misclassifications. Results demonstrated two general trends. First, across all conditions, treating RG responses as incorrect produced the largest degree of combined systematic and random error (larger than ignoring RG). Second, the remaining scoring approaches generally provided equal accuracy in parameter recovery when RG was perfectly identified; however, the multidimensional IRT approach was susceptible to increased error as misclassification rates grew. Overall, the findings suggest that recoding RG as missing and employing a unidimensional IRT model is a promising approach.

快速猜测(RG)是一种不费力的反应形式,其特征是反应延迟短。这种与结构无关的行为在以前的研究中已经表明,会对有关测量属性和分数的推断产生偏见。为了减轻这些有害影响,已经提出了许多响应时间阈值评分程序,这些程序对RG响应进行重新编码(例如,将其视为不正确或缺失,或估算可能值),然后使用一维或多维IRT模型估计重新编码的数据集的参数。迄今为止,在RG在实践中可能被错误分类的情况下,对这些方法进行比较的尝试有限。为了解决这一缺点,本模拟研究通过操纵样本量、RG倾向和能力之间的线性关系、RG反应的百分比以及RG错误分类的类型和比率,比较了四种评分程序的项目和能力参数恢复情况。结果显示了两个总体趋势。首先,在所有条件下,将RG响应视为不正确会产生最大程度的系统和随机组合误差(大于忽略RG)。其次,当RG被完全识别时,剩余的评分方法通常在参数恢复中提供相同的准确性;然而,随着错误分类率的增加,多维IRT方法容易出现错误。总体而言,研究结果表明,将RG重新编码为缺失并使用一维IRT模型是一种很有前途的方法。
{"title":"A Comparison of Response Time Threshold Scoring Procedures in Mitigating Bias From Rapid Guessing Behavior.","authors":"Joseph A Rios, Jiayi Deng","doi":"10.1177/00131644231168398","DOIUrl":"10.1177/00131644231168398","url":null,"abstract":"<p><p>Rapid guessing (RG) is a form of non-effortful responding that is characterized by short response latencies. This construct-irrelevant behavior has been shown in previous research to bias inferences concerning measurement properties and scores. To mitigate these deleterious effects, a number of response time threshold scoring procedures have been proposed, which recode RG responses (e.g., treat them as incorrect or missing, or impute probable values) and then estimate parameters for the recoded dataset using a unidimensional or multidimensional IRT model. To date, there have been limited attempts to compare these methods under the possibility that RG may be misclassified in practice. To address this shortcoming, the present simulation study compared item and ability parameter recovery for four scoring procedures by manipulating sample size, the linear relationship between RG propensity and ability, the percentage of RG responses, and the type and rate of RG misclassifications. Results demonstrated two general trends. First, across all conditions, treating RG responses as incorrect produced the largest degree of combined systematic and random error (larger than ignoring RG). Second, the remaining scoring approaches generally provided equal accuracy in parameter recovery when RG was perfectly identified; however, the multidimensional IRT approach was susceptible to increased error as misclassification rates grew. Overall, the findings suggest that recoding RG as missing and employing a unidimensional IRT model is a promising approach.</p>","PeriodicalId":11502,"journal":{"name":"Educational and Psychological Measurement","volume":null,"pages":null},"PeriodicalIF":2.1,"publicationDate":"2024-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11185099/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42919836","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Note on Comparing the Bifactor and Second-Order Factor Models: Is the Bayesian Information Criterion a Routinely Dependable Index for Model Selection? 双因子模型与二阶因子模型比较注:贝叶斯信息准则是模型选择的常规可靠指标吗?
IF 2.1 3区 心理学 Q1 Social Sciences Pub Date : 2024-04-01 Epub Date: 2023-04-21 DOI: 10.1177/00131644231166348
Tenko Raykov, Christine DiStefano, Lisa Calvocoressi

This note demonstrates that the widely used Bayesian Information Criterion (BIC) need not be generally viewed as a routinely dependable index for model selection when the bifactor and second-order factor models are examined as rival means for data description and explanation. To this end, we use an empirically relevant setting with multidimensional measuring instrument components, where the bifactor model is found consistently inferior to the second-order model in terms of the BIC even though the data on a large number of replications at different sample sizes were generated following the bifactor model. We therefore caution researchers that routine reliance on the BIC for the purpose of discriminating between these two widely used models may not always lead to correct decisions with respect to model choice.

本文表明,当双因子和二阶因子模型作为数据描述和解释的竞争手段进行检验时,广泛使用的贝叶斯信息准则(BIC)通常不需要被视为模型选择的常规可靠指标。为此,我们使用具有多维测量仪器组件的经验相关设置,其中发现双因素模型在BIC方面始终不如二阶模型,即使在不同样本量的大量重复上的数据是根据双因素模型生成的。因此,我们提醒研究人员,为了区分这两种广泛使用的模型,常规依赖BIC可能并不总是导致关于模型选择的正确决策。
{"title":"A Note on Comparing the Bifactor and Second-Order Factor Models: Is the Bayesian Information Criterion a Routinely Dependable Index for Model Selection?","authors":"Tenko Raykov, Christine DiStefano, Lisa Calvocoressi","doi":"10.1177/00131644231166348","DOIUrl":"10.1177/00131644231166348","url":null,"abstract":"<p><p>This note demonstrates that the widely used Bayesian Information Criterion (BIC) need not be generally viewed as a routinely dependable index for model selection when the bifactor and second-order factor models are examined as rival means for data description and explanation. To this end, we use an empirically relevant setting with multidimensional measuring instrument components, where the bifactor model is found consistently inferior to the second-order model in terms of the BIC even though the data on a large number of replications at different sample sizes were generated following the bifactor model. We therefore caution researchers that routine reliance on the BIC for the purpose of discriminating between these two widely used models may not always lead to correct decisions with respect to model choice.</p>","PeriodicalId":11502,"journal":{"name":"Educational and Psychological Measurement","volume":null,"pages":null},"PeriodicalIF":2.1,"publicationDate":"2024-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11185100/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42160832","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Latent Variable Forests for Latent Variable Score Estimation 用于潜在变量分数估计的潜在变量森林
IF 2.7 3区 心理学 Q1 Social Sciences Pub Date : 2024-04-01 DOI: 10.1177/00131644241237502
Franz Classe, Christoph Kern
We develop a latent variable forest (LV Forest) algorithm for the estimation of latent variable scores with one or more latent variables. LV Forest estimates unbiased latent variable scores based on confirmatory factor analysis (CFA) models with ordinal and/or numerical response variables. Through parametric model restrictions paired with a nonparametric tree-based machine learning approach, LV Forest estimates latent variable scores using models that are unbiased with respect to relevant subgroups in the population. This way, estimated latent variable scores are interpretable with respect to systematic influences of covariates without being biased by these variables. By building a tree ensemble, LV Forest takes parameter heterogeneity in latent variable modeling into account to capture subgroups with both good model fit and stable parameter estimates. We apply LV Forest to simulated data with heterogeneous model parameters as well as to real large-scale survey data. We show that LV Forest improves the accuracy of score estimation if parameter heterogeneity is present.
我们开发了一种潜变量森林(LV Forest)算法,用于估算具有一个或多个潜变量的潜变量得分。LV Forest 基于带有序数和/或数字响应变量的确证因子分析(CFA)模型估算无偏潜变量得分。通过参数模型限制与基于树的非参数机器学习方法的搭配,LV Forest 利用模型估算潜变量得分,这些模型对人群中的相关子群是无偏的。这样,估算出的潜在变量得分就可以解释协变量的系统性影响,而不会受到这些变量的影响。通过构建树状集合,LV Forest 将潜变量建模中的参数异质性考虑在内,从而捕捉到模型拟合度高、参数估计值稳定的亚群。我们将 LV Forest 应用于具有异质性模型参数的模拟数据以及真实的大规模调查数据。我们的研究表明,如果存在参数异质性,LV Forest 可以提高分数估计的准确性。
{"title":"Latent Variable Forests for Latent Variable Score Estimation","authors":"Franz Classe, Christoph Kern","doi":"10.1177/00131644241237502","DOIUrl":"https://doi.org/10.1177/00131644241237502","url":null,"abstract":"We develop a latent variable forest (LV Forest) algorithm for the estimation of latent variable scores with one or more latent variables. LV Forest estimates unbiased latent variable scores based on confirmatory factor analysis (CFA) models with ordinal and/or numerical response variables. Through parametric model restrictions paired with a nonparametric tree-based machine learning approach, LV Forest estimates latent variable scores using models that are unbiased with respect to relevant subgroups in the population. This way, estimated latent variable scores are interpretable with respect to systematic influences of covariates without being biased by these variables. By building a tree ensemble, LV Forest takes parameter heterogeneity in latent variable modeling into account to capture subgroups with both good model fit and stable parameter estimates. We apply LV Forest to simulated data with heterogeneous model parameters as well as to real large-scale survey data. We show that LV Forest improves the accuracy of score estimation if parameter heterogeneity is present.","PeriodicalId":11502,"journal":{"name":"Educational and Psychological Measurement","volume":null,"pages":null},"PeriodicalIF":2.7,"publicationDate":"2024-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140581666","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Educational and Psychological Measurement
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1