首页 > 最新文献

Educational and Psychological Measurement最新文献

英文 中文
On the Importance of Coefficient Alpha for Measurement Research: Loading Equality Is Not Necessary for Alpha's Utility as a Scale Reliability Index. 论系数 Alpha 在测量研究中的重要性:载荷相等并非阿尔法作为量表可靠性指标的必要条件
IF 2.1 3区 心理学 Q2 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS Pub Date : 2023-08-01 Epub Date: 2022-07-20 DOI: 10.1177/00131644221104972
Tenko Raykov, James C Anthony, Natalja Menold

The population relationship between coefficient alpha and scale reliability is studied in the widely used setting of unidimensional multicomponent measuring instruments. It is demonstrated that for any set of component loadings on the common factor, regardless of the extent of their inequality, the discrepancy between alpha and reliability can be arbitrarily small in any considered population and hence practically ignorable. In addition, the set of parameter values where this discrepancy is negligible is shown to possess the same dimensionality as that of the underlying model parameter space. The article contributes to the measurement and related literature by pointing out that (a) approximate or strict loading identity is not a necessary condition for the utility of alpha as a trustworthy index of scale reliability, and (b) coefficient alpha can be a dependable reliability measure with any extent of inequality in the component loadings.

在广泛使用的单维多成分测量工具中,研究了α系数与量表信度之间的群体关系。研究结果表明,对于共同因素上的任何一组成分负荷,无论其不平等程度如何,在任何考虑的群体中,α 与信度之间的差异都可能非常小,因此实际上是可以忽略不计的。此外,该差异可忽略不计的参数值集与基础模型参数空间具有相同的维度。文章指出:(a) 近似或严格的载荷同一性并不是α系数作为量表可靠性可靠指标的必要条件;(b) α系数可以作为可靠的可靠性指标,其成分载荷在任何程度上都是不平等的。
{"title":"On the Importance of Coefficient Alpha for Measurement Research: Loading Equality Is Not Necessary for Alpha's Utility as a Scale Reliability Index.","authors":"Tenko Raykov, James C Anthony, Natalja Menold","doi":"10.1177/00131644221104972","DOIUrl":"10.1177/00131644221104972","url":null,"abstract":"<p><p>The population relationship between coefficient alpha and scale reliability is studied in the widely used setting of unidimensional multicomponent measuring instruments. It is demonstrated that for any set of component loadings on the common factor, regardless of the extent of their inequality, the discrepancy between alpha and reliability can be arbitrarily small in any considered population and hence practically ignorable. In addition, the set of parameter values where this discrepancy is negligible is shown to possess the same dimensionality as that of the underlying model parameter space. The article contributes to the measurement and related literature by pointing out that (a) approximate or strict loading identity is not a necessary condition for the utility of alpha as a trustworthy index of scale reliability, and (b) coefficient alpha can be a dependable reliability measure with any extent of inequality in the component loadings.</p>","PeriodicalId":11502,"journal":{"name":"Educational and Psychological Measurement","volume":null,"pages":null},"PeriodicalIF":2.1,"publicationDate":"2023-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10311953/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9747518","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Bayesian General Model to Account for Individual Differences in Operation-Specific Learning Within a Test. 贝叶斯一般模型可解释测试中特定操作学习的个体差异。
IF 2.1 3区 心理学 Q2 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS Pub Date : 2023-08-01 Epub Date: 2022-09-19 DOI: 10.1177/00131644221109796
José H Lozano, Javier Revuelta

The present paper introduces a general multidimensional model to measure individual differences in learning within a single administration of a test. Learning is assumed to result from practicing the operations involved in solving the items. The model accounts for the possibility that the ability to learn may manifest differently for correct and incorrect responses, which allows for distinguishing different types of learning effects in the data. Model estimation and evaluation is based on a Bayesian framework. A simulation study is presented that examines the performance of the estimation and evaluation methods. The results show accuracy in parameter recovery as well as good performance in model evaluation and selection. An empirical study illustrates the applicability of the model to data from a logical ability test.

本文介绍了一种通用的多维模型,用于测量单次施测中学习的个体差异。假定学习是通过练习解题过程中所涉及的操作而产生的。该模型考虑到了学习能力可能在正确和错误的回答中表现出不同,从而可以区分数据中不同类型的学习效果。模型的估计和评估基于贝叶斯框架。本文介绍了一项模拟研究,以检验估计和评估方法的性能。结果表明,参数恢复准确,模型评估和选择性能良好。一项实证研究说明了该模型在逻辑能力测试数据中的适用性。
{"title":"A Bayesian General Model to Account for Individual Differences in Operation-Specific Learning Within a Test.","authors":"José H Lozano, Javier Revuelta","doi":"10.1177/00131644221109796","DOIUrl":"10.1177/00131644221109796","url":null,"abstract":"<p><p>The present paper introduces a general multidimensional model to measure individual differences in learning within a single administration of a test. Learning is assumed to result from practicing the operations involved in solving the items. The model accounts for the possibility that the ability to learn may manifest differently for correct and incorrect responses, which allows for distinguishing different types of learning effects in the data. Model estimation and evaluation is based on a Bayesian framework. A simulation study is presented that examines the performance of the estimation and evaluation methods. The results show accuracy in parameter recovery as well as good performance in model evaluation and selection. An empirical study illustrates the applicability of the model to data from a logical ability test.</p>","PeriodicalId":11502,"journal":{"name":"Educational and Psychological Measurement","volume":null,"pages":null},"PeriodicalIF":2.1,"publicationDate":"2023-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10311956/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10300370","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Awareness Is Bliss: How Acquiescence Affects Exploratory Factor Analysis. 意识是福:默许如何影响探索性因素分析。
IF 2.7 3区 心理学 Q1 Social Sciences Pub Date : 2023-06-01 DOI: 10.1177/00131644221089857
E Damiano D'Urso, Jesper Tijmstra, Jeroen K Vermunt, Kim De Roover

Assessing the measurement model (MM) of self-report scales is crucial to obtain valid measurements of individuals' latent psychological constructs. This entails evaluating the number of measured constructs and determining which construct is measured by which item. Exploratory factor analysis (EFA) is the most-used method to evaluate these psychometric properties, where the number of measured constructs (i.e., factors) is assessed, and, afterward, rotational freedom is resolved to interpret these factors. This study assessed the effects of an acquiescence response style (ARS) on EFA for unidimensional and multidimensional (un)balanced scales. Specifically, we evaluated (a) whether ARS is captured as an additional factor, (b) the effect of different rotation approaches on the content and ARS factors recovery, and (c) the effect of extracting the additional ARS factor on the recovery of factor loadings. ARS was often captured as an additional factor in balanced scales when it was strong. For these scales, ignoring extracting this additional ARS factor, or rotating to simple structure when extracting it, harmed the recovery of the original MM by introducing bias in loadings and cross-loadings. These issues were avoided by using informed rotation approaches (i.e., target rotation), where (part of) the rotation target is specified according to a priori expectations on the MM. Not extracting the additional ARS factor did not affect the loading recovery in unbalanced scales. Researchers should consider the potential presence of ARS when assessing the psychometric properties of balanced scales and use informed rotation approaches when suspecting that an additional factor is an ARS factor.

评估自我报告量表的测量模型(MM)是获得有效测量个体潜在心理构念的关键。这需要评估被测量的构念的数量,并确定哪个构念是由哪个项目测量的。探索性因素分析(EFA)是评估这些心理测量特性最常用的方法,其中评估测量的构念(即因素)的数量,然后解决旋转自由度来解释这些因素。本研究评估了默许反应方式(ARS)在一维和多维(非)平衡量表上对EFA的影响。具体而言,我们评估了(a) ARS是否作为附加因子被捕获,(b)不同旋转方式对ARS含量和ARS因子恢复的影响,以及(c)提取附加ARS因子对因子加载恢复的影响。当ARS较强时,它通常被视为平衡尺度中的一个附加因素。对于这些尺度,忽略提取这个额外的ARS因子,或者在提取时旋转到简单结构,通过在加载和交叉加载中引入偏差,损害原始MM的恢复。通过使用知情旋转方法(即目标旋转)避免了这些问题,其中(部分)旋转目标是根据MM的先验期望指定的。不提取额外的ARS因素并不影响不平衡尺度下的加载恢复。研究人员在评估平衡量表的心理测量特性时应考虑ARS的潜在存在,并在怀疑附加因素是ARS因素时使用知情旋转方法。
{"title":"Awareness Is Bliss: How Acquiescence Affects Exploratory Factor Analysis.","authors":"E Damiano D'Urso,&nbsp;Jesper Tijmstra,&nbsp;Jeroen K Vermunt,&nbsp;Kim De Roover","doi":"10.1177/00131644221089857","DOIUrl":"https://doi.org/10.1177/00131644221089857","url":null,"abstract":"<p><p>Assessing the measurement model (MM) of self-report scales is crucial to obtain valid measurements of individuals' latent psychological constructs. This entails evaluating the number of measured constructs and determining which construct is measured by which item. Exploratory factor analysis (EFA) is the most-used method to evaluate these psychometric properties, where the number of measured constructs (i.e., factors) is assessed, and, afterward, rotational freedom is resolved to interpret these factors. This study assessed the effects of an acquiescence response style (ARS) on EFA for unidimensional and multidimensional (un)balanced scales. Specifically, we evaluated (a) whether ARS is captured as an additional factor, (b) the effect of different rotation approaches on the content and ARS factors recovery, and (c) the effect of extracting the additional ARS factor on the recovery of factor loadings. ARS was often captured as an additional factor in balanced scales when it was strong. For these scales, ignoring extracting this additional ARS factor, or rotating to simple structure when extracting it, harmed the recovery of the original MM by introducing bias in loadings and cross-loadings. These issues were avoided by using informed rotation approaches (i.e., target rotation), where (part of) the rotation target is specified according to a priori expectations on the MM. Not extracting the additional ARS factor did not affect the loading recovery in unbalanced scales. Researchers should consider the potential presence of ARS when assessing the psychometric properties of balanced scales and use informed rotation approaches when suspecting that an additional factor is an ARS factor.</p>","PeriodicalId":11502,"journal":{"name":"Educational and Psychological Measurement","volume":null,"pages":null},"PeriodicalIF":2.7,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10177316/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9846850","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Scoring Graphical Responses in TIMSS 2019 Using Artificial Neural Networks. 使用人工神经网络对 2019 年 TIMSS 中的图形反应进行评分。
IF 2.7 3区 心理学 Q1 Social Sciences Pub Date : 2023-06-01 Epub Date: 2022-05-23 DOI: 10.1177/00131644221098021
Matthias von Davier, Lillian Tyack, Lale Khorramdel

Automated scoring of free drawings or images as responses has yet to be used in large-scale assessments of student achievement. In this study, we propose artificial neural networks to classify these types of graphical responses from a TIMSS 2019 item. We are comparing classification accuracy of convolutional and feed-forward approaches. Our results show that convolutional neural networks (CNNs) outperform feed-forward neural networks in both loss and accuracy. The CNN models classified up to 97.53% of the image responses into the appropriate scoring category, which is comparable to, if not more accurate, than typical human raters. These findings were further strengthened by the observation that the most accurate CNN models correctly classified some image responses that had been incorrectly scored by the human raters. As an additional innovation, we outline a method to select human-rated responses for the training sample based on an application of the expected response function derived from item response theory. This paper argues that CNN-based automated scoring of image responses is a highly accurate procedure that could potentially replace the workload and cost of second human raters for international large-scale assessments (ILSAs), while improving the validity and comparability of scoring complex constructed-response items.

在大规模的学生成绩评估中,尚未使用自由绘画或图像作为回答的自动评分。在本研究中,我们建议使用人工神经网络对 TIMSS 2019 项目中的这类图形回答进行分类。我们比较了卷积方法和前馈方法的分类准确性。我们的结果表明,卷积神经网络(CNN)在损失和准确性方面都优于前馈神经网络。卷积神经网络模型可将高达 97.53% 的图像响应分类到相应的评分类别中,其准确性甚至可媲美典型的人类评分员。通过观察发现,最准确的 CNN 模型能够正确地将一些被人类评分员错误评分的图像响应进行分类,从而进一步证实了这些发现。作为一项额外的创新,我们概述了一种方法,该方法基于从项目反应理论中得出的预期反应函数的应用,为训练样本选择人类评分的反应。本文认为,基于 CNN 的图像回答自动评分是一种高度精确的程序,有可能取代国际大规模测评(ILSA)中第二名人工评分员的工作量和成本,同时提高复杂构建回答项目评分的有效性和可比性。
{"title":"Scoring Graphical Responses in TIMSS 2019 Using Artificial Neural Networks.","authors":"Matthias von Davier, Lillian Tyack, Lale Khorramdel","doi":"10.1177/00131644221098021","DOIUrl":"10.1177/00131644221098021","url":null,"abstract":"<p><p>Automated scoring of free drawings or images as responses has yet to be used in large-scale assessments of student achievement. In this study, we propose artificial neural networks to classify these types of graphical responses from a TIMSS 2019 item. We are comparing classification accuracy of convolutional and feed-forward approaches. Our results show that convolutional neural networks (CNNs) outperform feed-forward neural networks in both loss and accuracy. The CNN models classified up to 97.53% of the image responses into the appropriate scoring category, which is comparable to, if not more accurate, than typical human raters. These findings were further strengthened by the observation that the most accurate CNN models correctly classified some image responses that had been incorrectly scored by the human raters. As an additional innovation, we outline a method to select human-rated responses for the training sample based on an application of the expected response function derived from item response theory. This paper argues that CNN-based automated scoring of image responses is a highly accurate procedure that could potentially replace the workload and cost of second human raters for international large-scale assessments (ILSAs), while improving the validity and comparability of scoring complex constructed-response items.</p>","PeriodicalId":11502,"journal":{"name":"Educational and Psychological Measurement","volume":null,"pages":null},"PeriodicalIF":2.7,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10177318/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9475856","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Assessing Dimensionality of IRT Models Using Traditional and Revised Parallel Analyses. 使用传统和修订的平行分析法评估 IRT 模型的维度。
IF 2.7 3区 心理学 Q1 Social Sciences Pub Date : 2023-06-01 Epub Date: 2022-07-21 DOI: 10.1177/00131644221111838
Wenjing Guo, Youn-Jeng Choi

Determining the number of dimensions is extremely important in applying item response theory (IRT) models to data. Traditional and revised parallel analyses have been proposed within the factor analysis framework, and both have shown some promise in assessing dimensionality. However, their performance in the IRT framework has not been systematically investigated. Therefore, we evaluated the accuracy of traditional and revised parallel analyses for determining the number of underlying dimensions in the IRT framework by conducting simulation studies. Six data generation factors were manipulated: number of observations, test length, type of generation models, number of dimensions, correlations between dimensions, and item discrimination. Results indicated that (a) when the generated IRT model is unidimensional, across all simulation conditions, traditional parallel analysis using principal component analysis and tetrachoric correlation performs best; (b) when the generated IRT model is multidimensional, traditional parallel analysis using principal component analysis and tetrachoric correlation yields the highest proportion of accurately identified underlying dimensions across all factors, except when the correlation between dimensions is 0.8 or the item discrimination is low; and (c) under a few combinations of simulated factors, none of the eight methods performed well (e.g., when the generation model is three-dimensional 3PL, the item discrimination is low, and the correlation between dimensions is 0.8).

在将项目反应理论(IRT)模型应用于数据时,确定维度的数量极为重要。在因子分析框架内提出了传统的平行分析法和修订的平行分析法,这两种方法在评估维度方面都显示出了一定的前景。但是,它们在 IRT 框架中的表现还没有得到系统的研究。因此,我们通过进行模拟研究,评估了传统并行分析法和修订并行分析法在 IRT 框架中确定基本维度数量的准确性。我们操纵了六个数据生成因素:观察数、测验长度、生成模型类型、维度数、维度间相关性和项目区分度。结果表明:(a) 当生成的 IRT 模型为单维模型时,在所有模拟条件下,使用主成分分析和四元相关的传统平行分析法表现最佳;(b) 当生成的 IRT 模型为多维模型时,使用主成分分析和四元相关的传统平行分析法在所有因素中准确识别出的基本维度比例最高,但维度间相关性为 0.8 或项目区分度较低时除外;(c) 在少数模拟因素组合下,八种方法均表现不佳(例如,当生成模型为三维模型时,使用主成分分析和四元相关的传统平行分析法表现最佳;当生成模型为四维模型时,使用主成分分析和四元相关的传统平行分析法表现最佳;当生成模型为五维模型时,使用主成分分析和四元相关的传统平行分析法表现最佳、当生成模型为三维 3PL 时,项目区分度低,维度间相关性为 0.8)。
{"title":"Assessing Dimensionality of IRT Models Using Traditional and Revised Parallel Analyses.","authors":"Wenjing Guo, Youn-Jeng Choi","doi":"10.1177/00131644221111838","DOIUrl":"10.1177/00131644221111838","url":null,"abstract":"<p><p>Determining the number of dimensions is extremely important in applying item response theory (IRT) models to data. Traditional and revised parallel analyses have been proposed within the factor analysis framework, and both have shown some promise in assessing dimensionality. However, their performance in the IRT framework has not been systematically investigated. Therefore, we evaluated the accuracy of traditional and revised parallel analyses for determining the number of underlying dimensions in the IRT framework by conducting simulation studies. Six data generation factors were manipulated: number of observations, test length, type of generation models, number of dimensions, correlations between dimensions, and item discrimination. Results indicated that (a) when the generated IRT model is unidimensional, across all simulation conditions, traditional parallel analysis using principal component analysis and tetrachoric correlation performs best; (b) when the generated IRT model is multidimensional, traditional parallel analysis using principal component analysis and tetrachoric correlation yields the highest proportion of accurately identified underlying dimensions across all factors, except when the correlation between dimensions is 0.8 or the item discrimination is low; and (c) under a few combinations of simulated factors, none of the eight methods performed well (e.g., when the generation model is three-dimensional 3PL, the item discrimination is low, and the correlation between dimensions is 0.8).</p>","PeriodicalId":11502,"journal":{"name":"Educational and Psychological Measurement","volume":null,"pages":null},"PeriodicalIF":2.7,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10177320/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9475858","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Changes in the Speed-Ability Relation Through Different Treatments of Rapid Guessing. 快速猜测的不同处理方式对速度-能力关系的影响。
IF 2.1 3区 心理学 Q2 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS Pub Date : 2023-06-01 Epub Date: 2022-07-11 DOI: 10.1177/00131644221109490
Tobias Deribo, Frank Goldhammer, Ulf Kroehne

As researchers in the social sciences, we are often interested in studying not directly observable constructs through assessments and questionnaires. But even in a well-designed and well-implemented study, rapid-guessing behavior may occur. Under rapid-guessing behavior, a task is skimmed shortly but not read and engaged with in-depth. Hence, a response given under rapid-guessing behavior does bias constructs and relations of interest. Bias also appears reasonable for latent speed estimates obtained under rapid-guessing behavior, as well as the identified relation between speed and ability. This bias seems especially problematic considering that the relation between speed and ability has been shown to be able to improve precision in ability estimation. For this reason, we investigate if and how responses and response times obtained under rapid-guessing behavior affect the identified speed-ability relation and the precision of ability estimates in a joint model of speed and ability. Therefore, the study presents an empirical application that highlights a specific methodological problem resulting from rapid-guessing behavior. Here, we could show that different (non-)treatments of rapid guessing can lead to different conclusions about the underlying speed-ability relation. Furthermore, different rapid-guessing treatments led to wildly different conclusions about gains in precision through joint modeling. The results show the importance of taking rapid guessing into account when the psychometric use of response times is of interest.

作为社会科学领域的研究人员,我们经常有兴趣通过评估和问卷调查来研究无法直接观察到的构造。但是,即使是在精心设计和实施的研究中,也可能会出现快速猜测行为。在快速猜测行为下,任务会被快速浏览,而不是深入阅读和参与。因此,在快速猜测行为下做出的回答会对相关的构造和关系产生偏差。对于在快速猜测行为下获得的潜在速度估计值以及速度与能力之间的关系,偏差似乎也是合理的。考虑到速度与能力之间的关系已被证明能够提高能力估计的精确度,这种偏差似乎尤其成问题。为此,我们研究了在快速猜测行为下获得的反应和反应时间是否以及如何影响速度与能力之间的关系以及速度与能力联合模型中能力估计的精确度。因此,本研究提出了一个实证应用,强调了快速猜测行为导致的特定方法问题。在这里,我们可以证明,对快速猜测的不同(非)处理会导致对基本速度-能力关系的不同结论。此外,不同的快速猜测处理方法会导致对通过联合建模提高精确度的结论大相径庭。这些结果表明,在对反应时间的心理测量使用感兴趣时,将快速猜测考虑在内非常重要。
{"title":"Changes in the Speed-Ability Relation Through Different Treatments of Rapid Guessing.","authors":"Tobias Deribo, Frank Goldhammer, Ulf Kroehne","doi":"10.1177/00131644221109490","DOIUrl":"10.1177/00131644221109490","url":null,"abstract":"<p><p>As researchers in the social sciences, we are often interested in studying not directly observable constructs through assessments and questionnaires. But even in a well-designed and well-implemented study, rapid-guessing behavior may occur. Under rapid-guessing behavior, a task is skimmed shortly but not read and engaged with in-depth. Hence, a response given under rapid-guessing behavior does bias constructs and relations of interest. Bias also appears reasonable for latent speed estimates obtained under rapid-guessing behavior, as well as the identified relation between speed and ability. This bias seems especially problematic considering that the relation between speed and ability has been shown to be able to improve precision in ability estimation. For this reason, we investigate if and how responses and response times obtained under rapid-guessing behavior affect the identified speed-ability relation and the precision of ability estimates in a joint model of speed and ability. Therefore, the study presents an empirical application that highlights a specific methodological problem resulting from rapid-guessing behavior. Here, we could show that different (non-)treatments of rapid guessing can lead to different conclusions about the underlying speed-ability relation. Furthermore, different rapid-guessing treatments led to wildly different conclusions about gains in precision through joint modeling. The results show the importance of taking rapid guessing into account when the psychometric use of response times is of interest.</p>","PeriodicalId":11502,"journal":{"name":"Educational and Psychological Measurement","volume":null,"pages":null},"PeriodicalIF":2.1,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10177319/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9846842","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The Impact of Sample Size and Various Other Factors on Estimation of Dichotomous Mixture IRT Models. 样本大小和其他各种因素对二分法混合 IRT 模型估计的影响。
IF 2.7 3区 心理学 Q1 Social Sciences Pub Date : 2023-06-01 Epub Date: 2022-05-19 DOI: 10.1177/00131644221094325
Sedat Sen, Allan S Cohen

The purpose of this study was to examine the effects of different data conditions on item parameter recovery and classification accuracy of three dichotomous mixture item response theory (IRT) models: the Mix1PL, Mix2PL, and Mix3PL. Manipulated factors in the simulation included the sample size (11 different sample sizes from 100 to 5000), test length (10, 30, and 50), number of classes (2 and 3), the degree of latent class separation (normal/no separation, small, medium, and large), and class sizes (equal vs. nonequal). Effects were assessed using root mean square error (RMSE) and classification accuracy percentage computed between true parameters and estimated parameters. The results of this simulation study showed that more precise estimates of item parameters were obtained with larger sample sizes and longer test lengths. Recovery of item parameters decreased as the number of classes increased with the decrease in sample size. Recovery of classification accuracy for the conditions with two-class solutions was also better than that of three-class solutions. Results of both item parameter estimates and classification accuracy differed by model type. More complex models and models with larger class separations produced less accurate results. The effect of the mixture proportions also differentially affected RMSE and classification accuracy results. Groups of equal size produced more precise item parameter estimates, but the reverse was the case for classification accuracy results. Results suggested that dichotomous mixture IRT models required more than 2,000 examinees to be able to obtain stable results as even shorter tests required such large sample sizes for more precise estimates. This number increased as the number of latent classes, the degree of separation, and model complexity increased.

本研究的目的是检验不同数据条件对三种二分混合项目反应理论(IRT)模型(Mix1PL、Mix2PL 和 Mix3PL)的项目参数恢复和分类准确性的影响。模拟中的操纵因素包括样本量(从 100 到 5000 的 11 种不同样本量)、测试长度(10、30 和 50)、类数(2 和 3)、潜类分离程度(正常/不分离、小、中、大)和类大小(相等与不相等)。通过计算真实参数和估计参数之间的均方根误差(RMSE)和分类准确率百分比来评估效果。模拟研究结果表明,样本量越大、测试时间越长,项目参数的估计值越精确。随着样本量的减少,类别数增加,项目参数的恢复率下降。两类方案条件下的分类准确率恢复也优于三类方案条件下的分类准确率恢复。项目参数估计和分类准确率的结果因模型类型而异。更复杂的模型和类别分离更大的模型产生的结果准确性更低。混合比例的影响也会对均方根误差和分类精确度结果产生不同的影响。大小相等的组产生的项目参数估计更精确,但分类精确度结果则相反。结果表明,二分法混合 IRT 模型需要超过 2,000 名受试者才能获得稳定的结果,因为即使是较短的测验也需要如此大的样本量才能获得更精确的估计值。随着潜类数量、分离程度和模型复杂性的增加,这一数字也在增加。
{"title":"The Impact of Sample Size and Various Other Factors on Estimation of Dichotomous Mixture IRT Models.","authors":"Sedat Sen, Allan S Cohen","doi":"10.1177/00131644221094325","DOIUrl":"10.1177/00131644221094325","url":null,"abstract":"<p><p>The purpose of this study was to examine the effects of different data conditions on item parameter recovery and classification accuracy of three dichotomous mixture item response theory (IRT) models: the Mix1PL, Mix2PL, and Mix3PL. Manipulated factors in the simulation included the sample size (11 different sample sizes from 100 to 5000), test length (10, 30, and 50), number of classes (2 and 3), the degree of latent class separation (normal/no separation, small, medium, and large), and class sizes (equal vs. nonequal). Effects were assessed using root mean square error (RMSE) and classification accuracy percentage computed between true parameters and estimated parameters. The results of this simulation study showed that more precise estimates of item parameters were obtained with larger sample sizes and longer test lengths. Recovery of item parameters decreased as the number of classes increased with the decrease in sample size. Recovery of classification accuracy for the conditions with two-class solutions was also better than that of three-class solutions. Results of both item parameter estimates and classification accuracy differed by model type. More complex models and models with larger class separations produced less accurate results. The effect of the mixture proportions also differentially affected RMSE and classification accuracy results. Groups of equal size produced more precise item parameter estimates, but the reverse was the case for classification accuracy results. Results suggested that dichotomous mixture IRT models required more than 2,000 examinees to be able to obtain stable results as even shorter tests required such large sample sizes for more precise estimates. This number increased as the number of latent classes, the degree of separation, and model complexity increased.</p>","PeriodicalId":11502,"journal":{"name":"Educational and Psychological Measurement","volume":null,"pages":null},"PeriodicalIF":2.7,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10177317/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9475859","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Small Sample Correction for Factor Score Regression. 因子得分回归的小样本校正。
IF 2.7 3区 心理学 Q1 Social Sciences Pub Date : 2023-06-01 Epub Date: 2022-07-02 DOI: 10.1177/00131644221105505
Jasper Bogaert, Wen Wei Loh, Yves Rosseel

Factor score regression (FSR) is widely used as a convenient alternative to traditional structural equation modeling (SEM) for assessing structural relations between latent variables. But when latent variables are simply replaced by factor scores, biases in the structural parameter estimates often have to be corrected, due to the measurement error in the factor scores. The method of Croon (MOC) is a well-known bias correction technique. However, its standard implementation can render poor quality estimates in small samples (e.g. less than 100). This article aims to develop a small sample correction (SSC) that integrates two different modifications to the standard MOC. We conducted a simulation study to compare the empirical performance of (a) standard SEM, (b) the standard MOC, (c) naive FSR, and (d) the MOC with the proposed SSC. In addition, we assessed the robustness of the performance of the SSC in various models with a different number of predictors and indicators. The results showed that the MOC with the proposed SSC yielded smaller mean squared errors than SEM and the standard MOC in small samples and performed similarly to naive FSR. However, naive FSR yielded more biased estimates than the proposed MOC with SSC, by failing to account for measurement error in the factor scores.

因子得分回归(FSR)作为传统结构方程模型(SEM)的一种便捷替代方法,被广泛用于评估潜变量之间的结构关系。但是,当潜在变量被简单地替换为因子得分时,由于因子得分的测量误差,结构参数估计的偏差往往需要修正。克罗恩方法(MOC)是一种著名的偏差校正技术。然而,在小样本(如少于 100 个样本)情况下,其标准实施可能会导致估算质量低下。本文旨在开发一种小样本校正方法(SSC),它整合了对标准 MOC 的两种不同修正。我们进行了一项模拟研究,比较了 (a) 标准 SEM、(b) 标准 MOC、(c) 天真 FSR 和 (d) MOC 与建议的 SSC 的经验性能。此外,我们还评估了 SSC 在具有不同数量预测因子和指标的各种模型中的稳健性。结果表明,与 SEM 和标准 MOC 相比,在小样本中,建议 SSC 的 MOC 产生的均方误差更小,性能与天真 FSR 相似。然而,由于未能考虑因子得分的测量误差,天真 FSR 比拟议的带 SSC 的 MOC 产生了更多偏差估计。
{"title":"A Small Sample Correction for Factor Score Regression.","authors":"Jasper Bogaert, Wen Wei Loh, Yves Rosseel","doi":"10.1177/00131644221105505","DOIUrl":"10.1177/00131644221105505","url":null,"abstract":"<p><p>Factor score regression (FSR) is widely used as a convenient alternative to traditional structural equation modeling (SEM) for assessing structural relations between latent variables. But when latent variables are simply replaced by factor scores, biases in the structural parameter estimates often have to be corrected, due to the measurement error in the factor scores. The method of Croon (MOC) is a well-known bias correction technique. However, its standard implementation can render poor quality estimates in small samples (e.g. less than 100). This article aims to develop a small sample correction (SSC) that integrates two different modifications to the standard MOC. We conducted a simulation study to compare the empirical performance of (a) standard SEM, (b) the standard MOC, (c) naive FSR, and (d) the MOC with the proposed SSC. In addition, we assessed the robustness of the performance of the SSC in various models with a different number of predictors and indicators. The results showed that the MOC with the proposed SSC yielded smaller mean squared errors than SEM and the standard MOC in small samples and performed similarly to naive FSR. However, naive FSR yielded more biased estimates than the proposed MOC with SSC, by failing to account for measurement error in the factor scores.</p>","PeriodicalId":11502,"journal":{"name":"Educational and Psychological Measurement","volume":null,"pages":null},"PeriodicalIF":2.7,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10177321/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10349847","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Evaluation of Polytomous Item Locations in Multicomponent Measuring Instruments: A Note on a Latent Variable Modeling Procedure. 评估多成分测量工具中的多项式项目位置:关于潜在变量建模程序的说明。
IF 2.7 3区 心理学 Q1 Social Sciences Pub Date : 2023-06-01 Epub Date: 2022-03-02 DOI: 10.1177/00131644211072829
Tenko Raykov, Martin Pusic

This note is concerned with evaluation of location parameters for polytomous items in multiple-component measuring instruments. A point and interval estimation procedure for these parameters is outlined that is developed within the framework of latent variable modeling. The method permits educational, behavioral, biomedical, and marketing researchers to quantify important aspects of the functioning of items with ordered multiple response options, which follow the popular graded response model. The procedure is routinely and readily applicable in empirical studies using widely circulated software and is illustrated with empirical data.

本说明涉及多成分测量工具中多变量项目位置参数的评估。本文概述了在潜变量建模框架内开发的这些参数的点估计和区间估计程序。该方法允许教育、行为、生物医学和市场营销研究人员对具有有序多反应选项的项目功能的重要方面进行量化,这些项目遵循流行的分级反应模型。该程序可在经验研究中使用广泛流传的软件进行常规和随时应用,并用经验数据进行了说明。
{"title":"Evaluation of Polytomous Item Locations in Multicomponent Measuring Instruments: A Note on a Latent Variable Modeling Procedure.","authors":"Tenko Raykov, Martin Pusic","doi":"10.1177/00131644211072829","DOIUrl":"10.1177/00131644211072829","url":null,"abstract":"<p><p>This note is concerned with evaluation of location parameters for polytomous items in multiple-component measuring instruments. A point and interval estimation procedure for these parameters is outlined that is developed within the framework of latent variable modeling. The method permits educational, behavioral, biomedical, and marketing researchers to quantify important aspects of the functioning of items with ordered multiple response options, which follow the popular graded response model. The procedure is routinely and readily applicable in empirical studies using widely circulated software and is illustrated with empirical data.</p>","PeriodicalId":11502,"journal":{"name":"Educational and Psychological Measurement","volume":null,"pages":null},"PeriodicalIF":2.7,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10177315/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9846843","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Is the Area Under Curve Appropriate for Evaluating the Fit of Psychometric Models? 曲线下面积是否适合用于评估心理测量模型的拟合度?
IF 2.7 3区 心理学 Q1 Social Sciences Pub Date : 2023-06-01 Epub Date: 2022-05-24 DOI: 10.1177/00131644221098182
Yuting Han, Jihong Zhang, Zhehan Jiang, Dexin Shi

In the literature of modern psychometric modeling, mostly related to item response theory (IRT), the fit of model is evaluated through known indices, such as χ2, M2, and root mean square error of approximation (RMSEA) for absolute assessments as well as Akaike information criterion (AIC), consistent AIC (CAIC), and Bayesian information criterion (BIC) for relative comparisons. Recent developments show a merging trend of psychometric and machine learnings, yet there remains a gap in the model fit evaluation, specifically the use of the area under curve (AUC). This study focuses on the behaviors of AUC in fitting IRT models. Rounds of simulations were conducted to investigate AUC's appropriateness (e.g., power and Type I error rate) under various conditions. The results show that AUC possessed certain advantages under certain conditions such as high-dimensional structure with two-parameter logistic (2PL) and some three-parameter logistic (3PL) models, while disadvantages were also obvious when the true model is unidimensional. It cautions researchers about the dangers of using AUC solely in evaluating psychometric models.

在现代心理测量建模(主要与项目反应理论(IRT)相关)的文献中,模型的拟合度是通过已知的指标来评估的,如用于绝对评估的χ2、M2 和近似均方根误差(RMSEA),以及用于相对比较的 Akaike 信息准则(AIC)、一致 AIC(CAIC)和贝叶斯信息准则(BIC)。最近的发展显示了心理测量和机器学习的融合趋势,但在模型拟合度评估,特别是曲线下面积(AUC)的使用方面仍存在差距。本研究的重点是 AUC 在 IRT 模型拟合中的表现。研究人员进行了多轮模拟,以调查 AUC 在各种条件下的适当性(如功率和 I 类错误率)。结果表明,AUC 在某些条件下具有一定的优势,如高维结构的双参数逻辑(2PL)模型和某些三参数逻辑(3PL)模型,而当真实模型为单维模型时,AUC 的劣势也很明显。它提醒研究人员在评估心理测量模型时仅使用 AUC 的危险性。
{"title":"Is the Area Under Curve Appropriate for Evaluating the Fit of Psychometric Models?","authors":"Yuting Han, Jihong Zhang, Zhehan Jiang, Dexin Shi","doi":"10.1177/00131644221098182","DOIUrl":"10.1177/00131644221098182","url":null,"abstract":"<p><p>In the literature of modern psychometric modeling, mostly related to item response theory (IRT), the fit of model is evaluated through known indices, such as χ<sup>2</sup>, M2, and root mean square error of approximation (RMSEA) for absolute assessments as well as Akaike information criterion (AIC), consistent AIC (CAIC), and Bayesian information criterion (BIC) for relative comparisons. Recent developments show a merging trend of psychometric and machine learnings, yet there remains a gap in the model fit evaluation, specifically the use of the area under curve (AUC). This study focuses on the behaviors of AUC in fitting IRT models. Rounds of simulations were conducted to investigate AUC's appropriateness (e.g., power and Type I error rate) under various conditions. The results show that AUC possessed certain advantages under certain conditions such as high-dimensional structure with two-parameter logistic (2PL) and some three-parameter logistic (3PL) models, while disadvantages were also obvious when the true model is unidimensional. It cautions researchers about the dangers of using AUC solely in evaluating psychometric models.</p>","PeriodicalId":11502,"journal":{"name":"Educational and Psychological Measurement","volume":null,"pages":null},"PeriodicalIF":2.7,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10177322/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10299668","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Educational and Psychological Measurement
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1