Educational and Psychological Measurement最新文献

英文中文

Improving the Use of Parallel Analysis by Accounting for Sampling Variability of the Observed Correlation Matrix. 通过考虑观测相关矩阵的抽样变异性改进平行分析的使用。

IF 2.1 3区心理学 Q2 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS

Educational and Psychological Measurement

Pub Date : 2024-08-20 DOI: 10.1177/00131644241268073

Yan Xia, Xinchang Zhou

Parallel analysis has been considered one of the most accurate methods for determining the number of factors in factor analysis. One major advantage of parallel analysis over traditional factor retention methods (e.g., Kaiser's rule) is that it addresses the sampling variability of eigenvalues obtained from the identity matrix, representing the correlation matrix for a zero-factor model. This study argues that we should also address the sampling variability of eigenvalues obtained from the observed data, such that the results would inform practitioners of the variability of the number of factors across random samples. Thus, this study proposes to revise the parallel analysis to provide the proportion of random samples that suggest k factors (k = 0, 1, 2, . . .) rather than a single suggested number. Simulation results support the use of the proposed strategy, especially for research scenarios with limited sample sizes where sampling fluctuation is concerning.

平行分析法被认为是确定因子分析中因子个数的最准确方法之一。与传统的因子保留方法（如凯撒法则）相比，平行分析法的一大优势在于它能解决从特征矩阵（代表零因子模型的相关矩阵）中获得的特征值的抽样变异性问题。本研究认为，我们还应该解决从观测数据中获得的特征值的抽样变异性问题，从而使研究结果能够告知从业人员不同随机样本中因子数量的变异性。因此，本研究建议修改并行分析，以提供建议 k 个因子（k = 0、1、2、...）的随机样本比例，而不是单一的建议因子数。模拟结果支持使用所建议的策略，尤其是在样本量有限、抽样波动令人担忧的研究场景中。

引用次数: 0

Added Value of Subscores for Tests With Polytomous Items 多项式项目测试的子分数附加值

IF 2.7 3区心理学 Q2 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS

Educational and Psychological Measurement

Pub Date : 2024-08-07 DOI: 10.1177/00131644241268128

Kylie Gorney, Sandip Sinharay

Test-takers, policymakers, teachers, and institutions are increasingly demanding that testing programs provide more detailed feedback regarding test performance. As a result, there has been a growing interest in the reporting of subscores that potentially provide such detailed feedback. Haberman developed a method based on classical test theory for determining whether a subscore has added value over the total score. Sinharay conducted a detailed study using both real and simulated data and concluded that it is not common for subscores to have added value according to Haberman’s criterion. However, Sinharay almost exclusively dealt with data from tests with only dichotomous items. In this article, we show that it is more common for subscores to have added value in tests with polytomous items.

应试者、政策制定者、教师和机构越来越多地要求考试项目提供更详细的考试成绩反馈。因此，人们对有可能提供此类详细反馈的子分数的报告越来越感兴趣。哈伯曼（Haberman）根据经典测试理论开发了一种方法，用于确定子分数是否比总分具有附加值。辛哈雷利用真实数据和模拟数据进行了详细研究，得出结论认为，根据哈伯曼的标准，子分数具有附加值的情况并不常见。不过，辛哈雷几乎只处理了来自只有二分法项目的测试数据。在本文中，我们将证明在具有多义项目的测试中，子分数具有附加值的情况更为常见。

引用次数: 0

Evaluating The Predictive Reliability of Neural Networks in Psychological Research With Random Datasets 利用随机数据集评估神经网络在心理学研究中的预测可靠性

IF 2.7 3区心理学 Q2 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS

Educational and Psychological Measurement

Pub Date : 2024-07-25 DOI: 10.1177/00131644241262964

Yongtian Cheng, K. V. Petrides

Psychologists are emphasizing the importance of predictive conclusions. Machine learning methods, such as supervised neural networks, have been used in psychological studies as they naturally fit prediction tasks. However, we are concerned about whether neural networks fitted with random datasets (i.e., datasets where there is no relationship between ordinal independent variables and continuous or binary-dependent variables) can provide an acceptable level of predictive performance from a psychologist’s perspective. Through a Monte Carlo simulation study, we found that this kind of erroneous conclusion is not likely to be drawn as long as the sample size is larger than 50 with continuous-dependent variables. However, when the dependent variable is binary, the minimum sample size is 500 when the criteria are balanced accuracy ≥ .6 or balanced accuracy ≥ .65, and the minimum sample size is 200 when the criterion is balanced accuracy ≥ .7 for a decision error less than .05. In the case where area under the curve (AUC) is used as a metric, a sample size of 100, 200, and 500 is necessary when the minimum acceptable performance level is set at AUC ≥ .7, AUC ≥ .65, and AUC ≥ .6, respectively. The results found by this study can be used for sample size planning for psychologists who wish to apply neural networks for a qualitatively reliable conclusion. Further directions and limitations of the study are also discussed.

心理学家正在强调预测结论的重要性。机器学习方法（如有监督的神经网络）已被用于心理学研究，因为它们很自然地适合预测任务。然而，我们担心的是，从心理学家的角度来看，使用随机数据集（即序数自变量与连续或二元依赖变量之间不存在关系的数据集）拟合的神经网络能否提供可接受水平的预测性能。通过蒙特卡罗模拟研究，我们发现只要样本量大于 50 个连续因变量，就不太可能得出这种错误的结论。然而，当因变量为二元变量时，当标准为均衡准确度≥.6 或均衡准确度≥.65 时，最小样本量为 500 个；当标准为均衡准确度≥.7 时，最小样本量为 200 个，决策误差小于 0.05。在使用曲线下面积（AUC）作为指标的情况下，当最低可接受性能水平设定为 AUC ≥ .7、AUC ≥ .65 和 AUC ≥ .6 时，分别需要 100、200 和 500 个样本量。对于希望应用神经网络得出定性可靠结论的心理学家来说，本研究发现的结果可用于样本量规划。本研究的进一步方向和局限性也在讨论之列。

{"title":"Evaluating The Predictive Reliability of Neural Networks in Psychological Research With Random Datasets","authors":"Yongtian Cheng, K. V. Petrides","doi":"10.1177/00131644241262964","DOIUrl":"https://doi.org/10.1177/00131644241262964","url":null,"abstract":"Psychologists are emphasizing the importance of predictive conclusions. Machine learning methods, such as supervised neural networks, have been used in psychological studies as they naturally fit prediction tasks. However, we are concerned about whether neural networks fitted with random datasets (i.e., datasets where there is no relationship between ordinal independent variables and continuous or binary-dependent variables) can provide an acceptable level of predictive performance from a psychologist’s perspective. Through a Monte Carlo simulation study, we found that this kind of erroneous conclusion is not likely to be drawn as long as the sample size is larger than 50 with continuous-dependent variables. However, when the dependent variable is binary, the minimum sample size is 500 when the criteria are balanced accuracy ≥ .6 or balanced accuracy ≥ .65, and the minimum sample size is 200 when the criterion is balanced accuracy ≥ .7 for a decision error less than .05. In the case where area under the curve (AUC) is used as a metric, a sample size of 100, 200, and 500 is necessary when the minimum acceptable performance level is set at AUC ≥ .7, AUC ≥ .65, and AUC ≥ .6, respectively. The results found by this study can be used for sample size planning for psychologists who wish to apply neural networks for a qualitatively reliable conclusion. Further directions and limitations of the study are also discussed.","PeriodicalId":11502,"journal":{"name":"Educational and Psychological Measurement","volume":"39 1","pages":""},"PeriodicalIF":2.7,"publicationDate":"2024-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141772234","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Studying Factorial Invariance With Nominal Items: A Note on a Latent Variable Modeling Procedure 用名义项目研究因子不变量：关于潜在变量建模程序的说明

IF 2.7 3区心理学 Q2 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS

Educational and Psychological Measurement

Pub Date : 2024-06-24 DOI: 10.1177/00131644241256626

Tenko Raykov

A latent variable modeling procedure for studying factorial invariance and differential item functioning for multi-component measuring instruments with nominal items is discussed. The method is based on a multiple testing approach utilizing the false discovery rate concept and likelihood ratio tests. The procedure complements the Revuelta, Franco-Martinez, and Ximenez approach to factorial invariance examination, and permits localization of individual invariance violations. The outlined method does not require the selection of a reference observed variable and is illustrated with empirical data.

本文讨论了一种潜变量建模程序，用于研究具有名义项目的多成分测量工具的因子不变量和差异项目功能。该方法基于利用误发现率概念和似然比检验的多重检验方法。该方法与 Revuelta、Franco-Martinez 和 Ximenez 的因子不变量检验方法相辅相成，并允许对个别不变量违规行为进行定位。所概述的方法无需选择参考观测变量，并用经验数据进行了说明。

引用次数: 0

A Note on Evaluation of Polytomous Item Locations With the Rating Scale Model and Testing Its Fit 用评分量表模型评估多项式项目位置并测试其拟合度的说明

IF 2.7 3区心理学 Q2 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS

Educational and Psychological Measurement

Pub Date : 2024-06-24 DOI: 10.1177/00131644241259026

Tenko Raykov, Martin Pusic

A procedure is outlined for point and interval estimation of location parameters associated with polytomous items, or raters assessing studied subjects or cases, which follow the rating scale model. The method is developed within the framework of latent variable modeling, and is readily applied in empirical research using popular software. The approach permits testing the goodness of fit of this widely used model, which represents a rather parsimonious item response theory model as a means of description and explanation of an analyzed data set. The procedure allows examination of important aspects of the functioning of measuring instruments with polytomous ordinal items, which may also constitute person assessments furnished by teachers, counselors, judges, raters, or clinicians. The described method is illustrated using an empirical example.

本文概述了一种程序，用于按照评分量表模型，对与多变量项目或评估研究对象或案例的评分者相关的位置参数进行点估算和区间估算。该方法是在潜在变量建模框架内开发的，可使用流行软件方便地应用于实证研究。该方法允许对这一广泛使用的模型的拟合度进行测试，该模型代表了一种相当简洁的项目反应理论模型，是描述和解释所分析数据集的一种手段。该程序允许对带有多项式序数项目的测量工具的功能的重要方面进行检查，这些项目也可能构成由教师、顾问、法官、评分者或临床医生提供的个人评估。我们将通过一个实证例子来说明所描述的方法。

引用次数: 0

Enhancing the Detection of Social Desirability Bias Using Machine Learning: A Novel Application of Person-Fit Indices 利用机器学习加强对社会可取性偏见的检测：拟人指数的新应用

IF 2.7 3区心理学 Q2 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS

Educational and Psychological Measurement

Pub Date : 2024-05-30 DOI: 10.1177/00131644241255109

Sanaz Nazari, Walter L. Leite, A. Corinne Huggins-Manley

Social desirability bias (SDB) is a common threat to the validity of conclusions from responses to a scale or survey. There is a wide range of person-fit statistics in the literature that can be employed to detect SDB. In addition, machine learning classifiers, such as logistic regression and random forest, have the potential to distinguish between biased and unbiased responses. This study proposes a new application of these classifiers to detect SDB by considering several person-fit indices as features or predictors in the machine learning methods. The results of a Monte Carlo simulation study showed that for a single feature, applying person-fit indices directly and logistic regression led to similar classification results. However, the random forest classifier improved the classification of biased and unbiased responses substantially. Classification was improved in both logistic regression and random forest by considering multiple features simultaneously. Moreover, cross-validation indicated stable area under the curves (AUCs) across machine learning classifiers. A didactical illustration of applying random forest to detect SDB is presented.

社会可取性偏差（SDB）是一种常见的威胁，会影响从量表或调查中得出的结论的有效性。文献中有多种拟人统计方法可用于检测 SDB。此外，机器学习分类器（如逻辑回归和随机森林）也有可能区分有偏见和无偏见的回答。本研究提出了将这些分类器应用于检测 SDB 的新方法，即在机器学习方法中考虑几个人称拟合指数作为特征或预测因子。蒙特卡罗模拟研究结果表明，对于单一特征，直接应用人称拟合指数和逻辑回归的分类结果相似。不过，随机森林分类器大大提高了有偏差和无偏差响应的分类效果。通过同时考虑多个特征，逻辑回归和随机森林分类器的分类效果都得到了改善。此外，交叉验证表明，各种机器学习分类器的曲线下面积（AUC）都很稳定。本文介绍了应用随机森林检测 SDB 的教学示例。

{"title":"Enhancing the Detection of Social Desirability Bias Using Machine Learning: A Novel Application of Person-Fit Indices","authors":"Sanaz Nazari, Walter L. Leite, A. Corinne Huggins-Manley","doi":"10.1177/00131644241255109","DOIUrl":"https://doi.org/10.1177/00131644241255109","url":null,"abstract":"Social desirability bias (SDB) is a common threat to the validity of conclusions from responses to a scale or survey. There is a wide range of person-fit statistics in the literature that can be employed to detect SDB. In addition, machine learning classifiers, such as logistic regression and random forest, have the potential to distinguish between biased and unbiased responses. This study proposes a new application of these classifiers to detect SDB by considering several person-fit indices as features or predictors in the machine learning methods. The results of a Monte Carlo simulation study showed that for a single feature, applying person-fit indices directly and logistic regression led to similar classification results. However, the random forest classifier improved the classification of biased and unbiased responses substantially. Classification was improved in both logistic regression and random forest by considering multiple features simultaneously. Moreover, cross-validation indicated stable area under the curves (AUCs) across machine learning classifiers. A didactical illustration of applying random forest to detect SDB is presented.","PeriodicalId":11502,"journal":{"name":"Educational and Psychological Measurement","volume":"2018 1","pages":""},"PeriodicalIF":2.7,"publicationDate":"2024-05-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141188132","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Is Effort Moderated Scoring Robust to Multidimensional Rapid Guessing? 努力程度调节评分对多维快速猜测是否稳健？

IF 2.7 3区心理学 Q2 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS

Educational and Psychological Measurement

Pub Date : 2024-04-28 DOI: 10.1177/00131644241246749

Joseph A. Rios, Jiayi Deng

To mitigate the potential damaging consequences of rapid guessing (RG), a form of noneffortful responding, researchers have proposed a number of scoring approaches. The present simulation study examines the robustness of the most popular of these approaches, the unidimensional effort-moderated (EM) scoring procedure, to multidimensional RG (i.e., RG that is linearly related to examinee ability). Specifically, EM scoring is compared with the Holman–Glas (HG) method, a multidimensional scoring approach, in terms of model fit distortion, ability parameter recovery, and omega reliability distortion. Test difficulty, the proportion of RG present within a sample, and the strength of association between ability and RG propensity were manipulated to create 80 total conditions. Overall, the results showed that EM scoring provided improved model fit compared with HG scoring when RG comprised 12% or less of all item responses. Furthermore, no significant differences in ability parameter recovery and omega reliability distortion were noted when comparing these two scoring approaches under moderate degrees of RG multidimensionality. These limited differences were largely due to the limited impact of RG on aggregated ability (bias ranged from 0.00 to 0.05 logits) and reliability (distortion was ≤ .005 units) estimates when as much as 40% of item responses in the sample data reflected RG behavior.

快速猜测（RG）是一种非努力反应形式，为了减轻快速猜测的潜在破坏性后果，研究人员提出了许多评分方法。本模拟研究考察了这些方法中最流行的单维努力调解（EM）计分程序对多维 RG（即与考生能力呈线性关系的 RG）的稳健性。具体来说，EM 计分与 Holman-Glas（HG）方法（一种多维计分方法）在模型拟合失真、能力参数恢复和欧米茄信度失真方面进行了比较。测试难度、样本中出现 RG 的比例以及能力与 RG 倾向之间的关联强度受到操纵，共产生了 80 种条件。总体而言，研究结果表明，当 RG 占所有项目回答的 12% 或更少时，与 HG 评分相比，EM 评分的模型拟合度更高。此外，在中等程度的 RG 多维性条件下，比较这两种计分方法，在能力参数恢复和欧米茄信度失真方面没有发现明显差异。这些有限的差异主要是由于当样本数据中有多达 40% 的项目回答反映了 RG 行为时，RG 对综合能力（偏差范围在 0.00 至 0.05 logits 之间）和可靠性（失真度小于 0.005 单位）估计值的影响有限。

{"title":"Is Effort Moderated Scoring Robust to Multidimensional Rapid Guessing?","authors":"Joseph A. Rios, Jiayi Deng","doi":"10.1177/00131644241246749","DOIUrl":"https://doi.org/10.1177/00131644241246749","url":null,"abstract":"To mitigate the potential damaging consequences of rapid guessing (RG), a form of noneffortful responding, researchers have proposed a number of scoring approaches. The present simulation study examines the robustness of the most popular of these approaches, the unidimensional effort-moderated (EM) scoring procedure, to multidimensional RG (i.e., RG that is linearly related to examinee ability). Specifically, EM scoring is compared with the Holman–Glas (HG) method, a multidimensional scoring approach, in terms of model fit distortion, ability parameter recovery, and omega reliability distortion. Test difficulty, the proportion of RG present within a sample, and the strength of association between ability and RG propensity were manipulated to create 80 total conditions. Overall, the results showed that EM scoring provided improved model fit compared with HG scoring when RG comprised 12% or less of all item responses. Furthermore, no significant differences in ability parameter recovery and omega reliability distortion were noted when comparing these two scoring approaches under moderate degrees of RG multidimensionality. These limited differences were largely due to the limited impact of RG on aggregated ability (bias ranged from 0.00 to 0.05 logits) and reliability (distortion was ≤ .005 units) estimates when as much as 40% of item responses in the sample data reflected RG behavior.","PeriodicalId":11502,"journal":{"name":"Educational and Psychological Measurement","volume":"11 1","pages":""},"PeriodicalIF":2.7,"publicationDate":"2024-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140810537","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Comparing Accuracy of Parallel Analysis and Fit Statistics for Estimating the Number of Factors With Ordered Categorical Data in Exploratory Factor Analysis 比较平行分析和拟合统计在探索性因子分析中估计有序分类数据的因子数的准确性

IF 2.7 3区心理学 Q2 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS

Educational and Psychological Measurement

Pub Date : 2024-04-17 DOI: 10.1177/00131644241240435

Hyunjung Lee, Heining Cham

Determining the number of factors in exploratory factor analysis (EFA) is crucial because it affects the rest of the analysis and the conclusions of the study. Researchers have developed various methods for deciding the number of factors to retain in EFA, but this remains one of the most difficult decisions in the EFA. The purpose of this study is to compare the parallel analysis with the performance of fit indices that researchers have started using as another strategy for determining the optimal number of factors in EFA. The Monte Carlo simulation was conducted with ordered categorical items because there are mixed results in previous simulation studies, and ordered categorical items are common in behavioral science. The results of this study indicate that the parallel analysis and the root mean square error of approximation (RMSEA) performed well in most conditions, followed by the Tucker–Lewis index (TLI) and then by the comparative fit index (CFI). The robust corrections of CFI, TLI, and RMSEA performed better in detecting misfit underfactored models than the original fit indices. However, they did not produce satisfactory results in dichotomous data with a small sample size. Implications, limitations of this study, and future research directions are discussed.

在探索性因素分析（EFA）中确定因素的数量至关重要，因为它影响到分析的其余部分和研究的结论。研究人员开发了各种方法来决定 EFA 中应保留的因子数量，但这仍然是 EFA 中最难做出的决定之一。本研究的目的是比较平行分析与拟合指数的性能，研究人员已开始将拟合指数作为确定 EFA 中最佳因子数的另一种策略。蒙特卡洛模拟采用了有序分类项目，因为以往的模拟研究结果不一，而且有序分类项目在行为科学中很常见。研究结果表明，平行分析和均方根近似误差（RMSEA）在大多数情况下表现良好，其次是塔克-刘易斯指数（TLI），然后是比较拟合指数（CFI）。与原始拟合指数相比，CFI、TLI 和 RMSEA 的稳健修正在检测误拟合模型方面表现更好。然而，在样本量较小的二分数据中，它们并没有产生令人满意的结果。本文讨论了本研究的意义、局限性和未来的研究方向。

{"title":"Comparing Accuracy of Parallel Analysis and Fit Statistics for Estimating the Number of Factors With Ordered Categorical Data in Exploratory Factor Analysis","authors":"Hyunjung Lee, Heining Cham","doi":"10.1177/00131644241240435","DOIUrl":"https://doi.org/10.1177/00131644241240435","url":null,"abstract":"Determining the number of factors in exploratory factor analysis (EFA) is crucial because it affects the rest of the analysis and the conclusions of the study. Researchers have developed various methods for deciding the number of factors to retain in EFA, but this remains one of the most difficult decisions in the EFA. The purpose of this study is to compare the parallel analysis with the performance of fit indices that researchers have started using as another strategy for determining the optimal number of factors in EFA. The Monte Carlo simulation was conducted with ordered categorical items because there are mixed results in previous simulation studies, and ordered categorical items are common in behavioral science. The results of this study indicate that the parallel analysis and the root mean square error of approximation (RMSEA) performed well in most conditions, followed by the Tucker–Lewis index (TLI) and then by the comparative fit index (CFI). The robust corrections of CFI, TLI, and RMSEA performed better in detecting misfit underfactored models than the original fit indices. However, they did not produce satisfactory results in dichotomous data with a small sample size. Implications, limitations of this study, and future research directions are discussed.","PeriodicalId":11502,"journal":{"name":"Educational and Psychological Measurement","volume":"1 1","pages":""},"PeriodicalIF":2.7,"publicationDate":"2024-04-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140616292","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Exploring the Influence of Response Styles on Continuous Scale Assessments: Insights From a Novel Modeling Approach 探索连续量表评估中反应风格的影响：新颖建模方法的启示

IF 2.7 3区心理学 Q2 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS

Educational and Psychological Measurement

Pub Date : 2024-04-17 DOI: 10.1177/00131644241242789

Hung-Yu Huang

The use of discrete categorical formats to assess psychological traits has a long-standing tradition that is deeply embedded in item response theory models. The increasing prevalence and endorsement of computer- or web-based testing has led to greater focus on continuous response formats, which offer numerous advantages in both respondent experience and methodological considerations. Response styles, which are frequently observed in self-reported data, reflect a propensity to answer questionnaire items in a consistent manner, regardless of the item content. These response styles have been identified as causes of skewed scale scores and biased trait inferences. In this study, we investigate the impact of response styles on individuals’ responses within a continuous scale context, with a specific emphasis on extreme response style (ERS) and acquiescence response style (ARS). Building upon the established continuous response model (CRM), we propose extensions known as the CRM-ERS and CRM-ARS. These extensions are employed to quantitatively capture individual variations in these distinct response styles. The effectiveness of the proposed models was evaluated through a series of simulation studies. Bayesian methods were employed to effectively calibrate the model parameters. The results demonstrate that both models achieve satisfactory parameter recovery. Neglecting the effects of response styles led to biased estimation, underscoring the importance of accounting for these effects. Moreover, the estimation accuracy improved with increasing test length and sample size. An empirical analysis is presented to elucidate the practical applications and implications of the proposed models.

使用离散的分类形式来评估心理特征有着悠久的传统，这种传统深深植根于项目反应理论模型之中。随着计算机或网络测试的日益普及和认可，人们开始更多地关注连续反应形式，因为连续反应形式在被调查者体验和方法学考虑方面都有很多优势。在自我报告数据中经常出现的应答方式，反映了一种倾向，即无论项目内容如何，都以一致的方式回答问卷项目。这些回答方式被认为是造成量表评分偏差和特质推断偏差的原因。在本研究中，我们调查了在连续量表情境下，反应风格对个人反应的影响，并特别强调了极端反应风格（ERS）和默许反应风格（ARS）。在已建立的连续反应模型（CRM）的基础上，我们提出了称为 CRM-ERS 和 CRM-ARS 的扩展模型。这些扩展用于定量捕捉这些不同反应风格的个体差异。我们通过一系列模拟研究评估了所建议模型的有效性。采用贝叶斯方法对模型参数进行了有效校准。结果表明，这两个模型都实现了令人满意的参数恢复。忽略反应风格的影响会导致估计偏差，这突出了考虑这些影响的重要性。此外，随着测试时间和样本量的增加，估计精度也有所提高。本文通过实证分析阐明了所提模型的实际应用和意义。

{"title":"Exploring the Influence of Response Styles on Continuous Scale Assessments: Insights From a Novel Modeling Approach","authors":"Hung-Yu Huang","doi":"10.1177/00131644241242789","DOIUrl":"https://doi.org/10.1177/00131644241242789","url":null,"abstract":"The use of discrete categorical formats to assess psychological traits has a long-standing tradition that is deeply embedded in item response theory models. The increasing prevalence and endorsement of computer- or web-based testing has led to greater focus on continuous response formats, which offer numerous advantages in both respondent experience and methodological considerations. Response styles, which are frequently observed in self-reported data, reflect a propensity to answer questionnaire items in a consistent manner, regardless of the item content. These response styles have been identified as causes of skewed scale scores and biased trait inferences. In this study, we investigate the impact of response styles on individuals’ responses within a continuous scale context, with a specific emphasis on extreme response style (ERS) and acquiescence response style (ARS). Building upon the established continuous response model (CRM), we propose extensions known as the CRM-ERS and CRM-ARS. These extensions are employed to quantitatively capture individual variations in these distinct response styles. The effectiveness of the proposed models was evaluated through a series of simulation studies. Bayesian methods were employed to effectively calibrate the model parameters. The results demonstrate that both models achieve satisfactory parameter recovery. Neglecting the effects of response styles led to biased estimation, underscoring the importance of accounting for these effects. Moreover, the estimation accuracy improved with increasing test length and sample size. An empirical analysis is presented to elucidate the practical applications and implications of the proposed models.","PeriodicalId":11502,"journal":{"name":"Educational and Psychological Measurement","volume":"35 1","pages":""},"PeriodicalIF":2.7,"publicationDate":"2024-04-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140617770","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

The Impact of Insufficient Effort Responses on the Order of Category Thresholds in the Polytomous Rasch Model 多变量 Rasch 模型中努力不足的反应对类别阈值顺序的影响

IF 2.7 3区心理学 Q2 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS

Educational and Psychological Measurement

Pub Date : 2024-04-13 DOI: 10.1177/00131644241242806

Kuan-Yu Jin, Thomas Eckes

Insufficient effort responding (IER) refers to a lack of effort when answering survey or questionnaire items. Such items typically offer more than two ordered response categories, with Likert-type scales as the most prominent example. The underlying assumption is that the successive categories reflect increasing levels of the latent variable assessed. This research investigates how IER affects the intended category order of Likert-type scales, focusing on the category thresholds in the polytomous Rasch model. In a simulation study, we examined several IER patterns in datasets generated from the mixture model for IER (MMIER). The key findings were (a) random responding and overusing the non-extreme categories of a five-category scale were each associated with high frequencies of disordered category thresholds; (b) raising the IER rate from 5% to 10% led to a substantial increase in threshold disordering, particularly among easy and difficult items; (c) narrow distances between adjacent categories (0.5 logits) were associated with more frequent disordering, compared with wide distances (1.0 logits). Two real-data examples highlighted the efficiency and utility of the MMIER for detecting latent classes of respondents exhibiting different forms of IER. Under the MMIER, the frequency of disordered thresholds was reduced substantially in both examples. The discussion focuses on the practical implications of using the MMIER in survey research and points to directions for future research.

不充分努力回答 (IER) 是指在回答调查或问卷项目时不努力。这类项目通常提供两个以上的有序回答类别，最突出的例子就是李克特量表。其基本假设是，连续的类别反映了所评估的潜在变量的递增水平。本研究调查了 IER 如何影响李克特量表的预期类别顺序，重点是多态 Rasch 模型中的类别阈值。在一项模拟研究中，我们考察了由 IER 混合模型（MMIER）生成的数据集中的几种 IER 模式。主要发现有：(a) 随机应答和过度使用五类量表中的非极端类别都与类别阈值失调的高频率有关；(b) 将 IER 率从 5% 提高到 10% 会导致阈值失调的大幅增加，尤其是在容易和困难的项目中；(c) 相邻类别之间的距离窄（0.5 logits）与距离宽（1.0 logits）相比与更频繁的失调有关。两个真实数据实例凸显了多因素误差分析法在检测表现出不同形式 IER 的潜在受访者类别方面的效率和实用性。在 MMIER 的作用下，两个例子中无序阈值的频率都大幅降低。讨论的重点是在调查研究中使用 MMIER 的实际意义，并指出了未来研究的方向。

{"title":"The Impact of Insufficient Effort Responses on the Order of Category Thresholds in the Polytomous Rasch Model","authors":"Kuan-Yu Jin, Thomas Eckes","doi":"10.1177/00131644241242806","DOIUrl":"https://doi.org/10.1177/00131644241242806","url":null,"abstract":"Insufficient effort responding (IER) refers to a lack of effort when answering survey or questionnaire items. Such items typically offer more than two ordered response categories, with Likert-type scales as the most prominent example. The underlying assumption is that the successive categories reflect increasing levels of the latent variable assessed. This research investigates how IER affects the intended category order of Likert-type scales, focusing on the category thresholds in the polytomous Rasch model. In a simulation study, we examined several IER patterns in datasets generated from the mixture model for IER (MMIER). The key findings were (a) random responding and overusing the non-extreme categories of a five-category scale were each associated with high frequencies of disordered category thresholds; (b) raising the IER rate from 5% to 10% led to a substantial increase in threshold disordering, particularly among easy and difficult items; (c) narrow distances between adjacent categories (0.5 logits) were associated with more frequent disordering, compared with wide distances (1.0 logits). Two real-data examples highlighted the efficiency and utility of the MMIER for detecting latent classes of respondents exhibiting different forms of IER. Under the MMIER, the frequency of disordered thresholds was reduced substantially in both examples. The discussion focuses on the practical implications of using the MMIER in survey research and points to directions for future research.","PeriodicalId":11502,"journal":{"name":"Educational and Psychological Measurement","volume":"116 1","pages":""},"PeriodicalIF":2.7,"publicationDate":"2024-04-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140581547","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

Educational and Psychological Measurement

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀