首页 > 最新文献

Educational and Psychological Measurement最新文献

英文 中文
Are Speeded Tests Unfair? Modeling the Impact of Time Limits on the Gender Gap in Mathematics. 快速测试不公平吗?时间限制对数学性别差距的影响建模。
IF 2.7 3区 心理学 Q1 Social Sciences Pub Date : 2023-08-01 DOI: 10.1177/00131644221111076
Andrea H Stoevenbelt, Jelte M Wicherts, Paulette C Flore, Lorraine A T Phillips, Jakob Pietschnig, Bruno Verschuere, Martin Voracek, Inga Schwabe

When cognitive and educational tests are administered under time limits, tests may become speeded and this may affect the reliability and validity of the resulting test scores. Prior research has shown that time limits may create or enlarge gender gaps in cognitive and academic testing. On average, women complete fewer items than men when a test is administered with a strict time limit, whereas gender gaps are frequently reduced when time limits are relaxed. In this study, we propose that gender differences in test strategy might inflate gender gaps favoring men, and relate test strategy to stereotype threat effects under which women underperform due to the pressure of negative stereotypes about their performance. First, we applied a Bayesian two-dimensional item response theory (IRT) model to data obtained from two registered reports that investigated stereotype threat in mathematics, and estimated the latent correlation between underlying test strategy (here, completion factor, a proxy for working speed) and mathematics ability. Second, we tested the gender gap and assessed potential effects of stereotype threat on female test performance. We found a positive correlation between the completion factor and mathematics ability, such that more able participants dropped out later in the test. We did not observe a stereotype threat effect but found larger gender differences on the latent completion factor than on latent mathematical ability, suggesting that test strategies affect the gender gap in timed mathematics performance. We argue that if the effect of time limits on tests is not taken into account, this may lead to test unfairness and biased group comparisons, and urge researchers to consider these effects in either their analyses or study planning.

在有时间限制的情况下进行认知和教育测试时,测试可能会变快,这可能会影响测试结果的可靠性和有效性。先前的研究表明,时间限制可能会造成或扩大认知和学术测试中的性别差距。平均而言,在严格的时间限制下进行测试时,女性完成的项目比男性少,而在时间限制放松时,性别差距往往会缩小。在本研究中,我们提出考试策略的性别差异可能会扩大有利于男性的性别差距,并将考试策略与刻板印象威胁效应联系起来,在刻板印象威胁效应下,女性由于对其表现的负面刻板印象压力而表现不佳。首先,我们将贝叶斯二维项目反应理论(IRT)模型应用于两份调查数学刻板印象威胁的注册报告的数据,并估计了潜在的测试策略(这里是完成因子,工作速度的代理)与数学能力之间的潜在相关性。其次,我们测试了性别差异,并评估了刻板印象威胁对女性考试成绩的潜在影响。我们发现完成因子与数学能力之间存在正相关关系,因此较有能力的参与者在测试后期退出了测试。我们没有观察到刻板印象威胁效应,但发现潜在完成因子的性别差异大于潜在数学能力的性别差异,这表明测试策略影响了时间数学成绩的性别差异。我们认为,如果不考虑时间限制对测试的影响,这可能导致测试不公平和有偏见的组比较,并敦促研究人员在他们的分析或研究计划中考虑这些影响。
{"title":"Are Speeded Tests Unfair? Modeling the Impact of Time Limits on the Gender Gap in Mathematics.","authors":"Andrea H Stoevenbelt,&nbsp;Jelte M Wicherts,&nbsp;Paulette C Flore,&nbsp;Lorraine A T Phillips,&nbsp;Jakob Pietschnig,&nbsp;Bruno Verschuere,&nbsp;Martin Voracek,&nbsp;Inga Schwabe","doi":"10.1177/00131644221111076","DOIUrl":"https://doi.org/10.1177/00131644221111076","url":null,"abstract":"<p><p>When cognitive and educational tests are administered under time limits, tests may become speeded and this may affect the reliability and validity of the resulting test scores. Prior research has shown that time limits may create or enlarge gender gaps in cognitive and academic testing. On average, women complete fewer items than men when a test is administered with a strict time limit, whereas gender gaps are frequently reduced when time limits are relaxed. In this study, we propose that gender differences in test strategy might inflate gender gaps favoring men, and relate test strategy to stereotype threat effects under which women underperform due to the pressure of negative stereotypes about their performance. First, we applied a Bayesian two-dimensional item response theory (IRT) model to data obtained from two registered reports that investigated stereotype threat in mathematics, and estimated the latent correlation between underlying test strategy (here, completion factor, a proxy for working speed) and mathematics ability. Second, we tested the gender gap and assessed potential effects of stereotype threat on female test performance. We found a positive correlation between the completion factor and mathematics ability, such that more able participants dropped out later in the test. We did not observe a stereotype threat effect but found larger gender differences on the latent completion factor than on latent mathematical ability, suggesting that test strategies affect the gender gap in timed mathematics performance. We argue that if the effect of time limits on tests is not taken into account, this may lead to test unfairness and biased group comparisons, and urge researchers to consider these effects in either their analyses or study planning.</p>","PeriodicalId":11502,"journal":{"name":"Educational and Psychological Measurement","volume":null,"pages":null},"PeriodicalIF":2.7,"publicationDate":"2023-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10311959/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10299044","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
A Robust Method for Detecting Item Misfit in Large-Scale Assessments. 在大规模评估中检测项目错位的稳健方法。
IF 2.1 3区 心理学 Q2 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS Pub Date : 2023-08-01 Epub Date: 2022-07-02 DOI: 10.1177/00131644221105819
Matthias von Davier, Ummugul Bezirhan

Viable methods for the identification of item misfit or Differential Item Functioning (DIF) are central to scale construction and sound measurement. Many approaches rely on the derivation of a limiting distribution under the assumption that a certain model fits the data perfectly. Typical DIF assumptions such as the monotonicity and population independence of item functions are present even in classical test theory but are more explicitly stated when using item response theory or other latent variable models for the assessment of item fit. The work presented here provides a robust approach for DIF detection that does not assume perfect model data fit, but rather uses Tukey's concept of contaminated distributions. The approach uses robust outlier detection to flag items for which adequate model data fit cannot be established.

识别项目不拟合或差异项目功能(DIF)的可行方法是量表构建和合理测量的核心。许多方法都依赖于在某个模型完全拟合数据的假设下推导出一个极限分布。典型的 DIF 假设,如项目函数的单调性和群体独立性,甚至在经典测验理论中都存在,但在使用项目反应理论或其他潜变量模型评估项目拟合度时,这些假设会得到更明确的阐述。本文介绍的工作提供了一种稳健的 DIF 检测方法,它不假定模型数据完全拟合,而是使用 Tukey 的污染分布概念。该方法使用稳健离群点检测来标记无法建立充分模型数据拟合的项目。
{"title":"A Robust Method for Detecting Item Misfit in Large-Scale Assessments.","authors":"Matthias von Davier, Ummugul Bezirhan","doi":"10.1177/00131644221105819","DOIUrl":"10.1177/00131644221105819","url":null,"abstract":"<p><p>Viable methods for the identification of item misfit or Differential Item Functioning (DIF) are central to scale construction and sound measurement. Many approaches rely on the derivation of a limiting distribution under the assumption that a certain model fits the data perfectly. Typical DIF assumptions such as the monotonicity and population independence of item functions are present even in classical test theory but are more explicitly stated when using item response theory or other latent variable models for the assessment of item fit. The work presented here provides a robust approach for DIF detection that does not assume perfect model data fit, but rather uses Tukey's concept of contaminated distributions. The approach uses robust outlier detection to flag items for which adequate model data fit cannot be established.</p>","PeriodicalId":11502,"journal":{"name":"Educational and Psychological Measurement","volume":null,"pages":null},"PeriodicalIF":2.1,"publicationDate":"2023-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10311954/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9747519","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
On the Importance of Coefficient Alpha for Measurement Research: Loading Equality Is Not Necessary for Alpha's Utility as a Scale Reliability Index. 论系数 Alpha 在测量研究中的重要性:载荷相等并非阿尔法作为量表可靠性指标的必要条件
IF 2.1 3区 心理学 Q2 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS Pub Date : 2023-08-01 Epub Date: 2022-07-20 DOI: 10.1177/00131644221104972
Tenko Raykov, James C Anthony, Natalja Menold

The population relationship between coefficient alpha and scale reliability is studied in the widely used setting of unidimensional multicomponent measuring instruments. It is demonstrated that for any set of component loadings on the common factor, regardless of the extent of their inequality, the discrepancy between alpha and reliability can be arbitrarily small in any considered population and hence practically ignorable. In addition, the set of parameter values where this discrepancy is negligible is shown to possess the same dimensionality as that of the underlying model parameter space. The article contributes to the measurement and related literature by pointing out that (a) approximate or strict loading identity is not a necessary condition for the utility of alpha as a trustworthy index of scale reliability, and (b) coefficient alpha can be a dependable reliability measure with any extent of inequality in the component loadings.

在广泛使用的单维多成分测量工具中,研究了α系数与量表信度之间的群体关系。研究结果表明,对于共同因素上的任何一组成分负荷,无论其不平等程度如何,在任何考虑的群体中,α 与信度之间的差异都可能非常小,因此实际上是可以忽略不计的。此外,该差异可忽略不计的参数值集与基础模型参数空间具有相同的维度。文章指出:(a) 近似或严格的载荷同一性并不是α系数作为量表可靠性可靠指标的必要条件;(b) α系数可以作为可靠的可靠性指标,其成分载荷在任何程度上都是不平等的。
{"title":"On the Importance of Coefficient Alpha for Measurement Research: Loading Equality Is Not Necessary for Alpha's Utility as a Scale Reliability Index.","authors":"Tenko Raykov, James C Anthony, Natalja Menold","doi":"10.1177/00131644221104972","DOIUrl":"10.1177/00131644221104972","url":null,"abstract":"<p><p>The population relationship between coefficient alpha and scale reliability is studied in the widely used setting of unidimensional multicomponent measuring instruments. It is demonstrated that for any set of component loadings on the common factor, regardless of the extent of their inequality, the discrepancy between alpha and reliability can be arbitrarily small in any considered population and hence practically ignorable. In addition, the set of parameter values where this discrepancy is negligible is shown to possess the same dimensionality as that of the underlying model parameter space. The article contributes to the measurement and related literature by pointing out that (a) approximate or strict loading identity is not a necessary condition for the utility of alpha as a trustworthy index of scale reliability, and (b) coefficient alpha can be a dependable reliability measure with any extent of inequality in the component loadings.</p>","PeriodicalId":11502,"journal":{"name":"Educational and Psychological Measurement","volume":null,"pages":null},"PeriodicalIF":2.1,"publicationDate":"2023-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10311953/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9747518","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Bayesian General Model to Account for Individual Differences in Operation-Specific Learning Within a Test. 贝叶斯一般模型可解释测试中特定操作学习的个体差异。
IF 2.1 3区 心理学 Q2 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS Pub Date : 2023-08-01 Epub Date: 2022-09-19 DOI: 10.1177/00131644221109796
José H Lozano, Javier Revuelta

The present paper introduces a general multidimensional model to measure individual differences in learning within a single administration of a test. Learning is assumed to result from practicing the operations involved in solving the items. The model accounts for the possibility that the ability to learn may manifest differently for correct and incorrect responses, which allows for distinguishing different types of learning effects in the data. Model estimation and evaluation is based on a Bayesian framework. A simulation study is presented that examines the performance of the estimation and evaluation methods. The results show accuracy in parameter recovery as well as good performance in model evaluation and selection. An empirical study illustrates the applicability of the model to data from a logical ability test.

本文介绍了一种通用的多维模型,用于测量单次施测中学习的个体差异。假定学习是通过练习解题过程中所涉及的操作而产生的。该模型考虑到了学习能力可能在正确和错误的回答中表现出不同,从而可以区分数据中不同类型的学习效果。模型的估计和评估基于贝叶斯框架。本文介绍了一项模拟研究,以检验估计和评估方法的性能。结果表明,参数恢复准确,模型评估和选择性能良好。一项实证研究说明了该模型在逻辑能力测试数据中的适用性。
{"title":"A Bayesian General Model to Account for Individual Differences in Operation-Specific Learning Within a Test.","authors":"José H Lozano, Javier Revuelta","doi":"10.1177/00131644221109796","DOIUrl":"10.1177/00131644221109796","url":null,"abstract":"<p><p>The present paper introduces a general multidimensional model to measure individual differences in learning within a single administration of a test. Learning is assumed to result from practicing the operations involved in solving the items. The model accounts for the possibility that the ability to learn may manifest differently for correct and incorrect responses, which allows for distinguishing different types of learning effects in the data. Model estimation and evaluation is based on a Bayesian framework. A simulation study is presented that examines the performance of the estimation and evaluation methods. The results show accuracy in parameter recovery as well as good performance in model evaluation and selection. An empirical study illustrates the applicability of the model to data from a logical ability test.</p>","PeriodicalId":11502,"journal":{"name":"Educational and Psychological Measurement","volume":null,"pages":null},"PeriodicalIF":2.1,"publicationDate":"2023-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10311956/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10300370","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Awareness Is Bliss: How Acquiescence Affects Exploratory Factor Analysis. 意识是福:默许如何影响探索性因素分析。
IF 2.7 3区 心理学 Q1 Social Sciences Pub Date : 2023-06-01 DOI: 10.1177/00131644221089857
E Damiano D'Urso, Jesper Tijmstra, Jeroen K Vermunt, Kim De Roover

Assessing the measurement model (MM) of self-report scales is crucial to obtain valid measurements of individuals' latent psychological constructs. This entails evaluating the number of measured constructs and determining which construct is measured by which item. Exploratory factor analysis (EFA) is the most-used method to evaluate these psychometric properties, where the number of measured constructs (i.e., factors) is assessed, and, afterward, rotational freedom is resolved to interpret these factors. This study assessed the effects of an acquiescence response style (ARS) on EFA for unidimensional and multidimensional (un)balanced scales. Specifically, we evaluated (a) whether ARS is captured as an additional factor, (b) the effect of different rotation approaches on the content and ARS factors recovery, and (c) the effect of extracting the additional ARS factor on the recovery of factor loadings. ARS was often captured as an additional factor in balanced scales when it was strong. For these scales, ignoring extracting this additional ARS factor, or rotating to simple structure when extracting it, harmed the recovery of the original MM by introducing bias in loadings and cross-loadings. These issues were avoided by using informed rotation approaches (i.e., target rotation), where (part of) the rotation target is specified according to a priori expectations on the MM. Not extracting the additional ARS factor did not affect the loading recovery in unbalanced scales. Researchers should consider the potential presence of ARS when assessing the psychometric properties of balanced scales and use informed rotation approaches when suspecting that an additional factor is an ARS factor.

评估自我报告量表的测量模型(MM)是获得有效测量个体潜在心理构念的关键。这需要评估被测量的构念的数量,并确定哪个构念是由哪个项目测量的。探索性因素分析(EFA)是评估这些心理测量特性最常用的方法,其中评估测量的构念(即因素)的数量,然后解决旋转自由度来解释这些因素。本研究评估了默许反应方式(ARS)在一维和多维(非)平衡量表上对EFA的影响。具体而言,我们评估了(a) ARS是否作为附加因子被捕获,(b)不同旋转方式对ARS含量和ARS因子恢复的影响,以及(c)提取附加ARS因子对因子加载恢复的影响。当ARS较强时,它通常被视为平衡尺度中的一个附加因素。对于这些尺度,忽略提取这个额外的ARS因子,或者在提取时旋转到简单结构,通过在加载和交叉加载中引入偏差,损害原始MM的恢复。通过使用知情旋转方法(即目标旋转)避免了这些问题,其中(部分)旋转目标是根据MM的先验期望指定的。不提取额外的ARS因素并不影响不平衡尺度下的加载恢复。研究人员在评估平衡量表的心理测量特性时应考虑ARS的潜在存在,并在怀疑附加因素是ARS因素时使用知情旋转方法。
{"title":"Awareness Is Bliss: How Acquiescence Affects Exploratory Factor Analysis.","authors":"E Damiano D'Urso,&nbsp;Jesper Tijmstra,&nbsp;Jeroen K Vermunt,&nbsp;Kim De Roover","doi":"10.1177/00131644221089857","DOIUrl":"https://doi.org/10.1177/00131644221089857","url":null,"abstract":"<p><p>Assessing the measurement model (MM) of self-report scales is crucial to obtain valid measurements of individuals' latent psychological constructs. This entails evaluating the number of measured constructs and determining which construct is measured by which item. Exploratory factor analysis (EFA) is the most-used method to evaluate these psychometric properties, where the number of measured constructs (i.e., factors) is assessed, and, afterward, rotational freedom is resolved to interpret these factors. This study assessed the effects of an acquiescence response style (ARS) on EFA for unidimensional and multidimensional (un)balanced scales. Specifically, we evaluated (a) whether ARS is captured as an additional factor, (b) the effect of different rotation approaches on the content and ARS factors recovery, and (c) the effect of extracting the additional ARS factor on the recovery of factor loadings. ARS was often captured as an additional factor in balanced scales when it was strong. For these scales, ignoring extracting this additional ARS factor, or rotating to simple structure when extracting it, harmed the recovery of the original MM by introducing bias in loadings and cross-loadings. These issues were avoided by using informed rotation approaches (i.e., target rotation), where (part of) the rotation target is specified according to a priori expectations on the MM. Not extracting the additional ARS factor did not affect the loading recovery in unbalanced scales. Researchers should consider the potential presence of ARS when assessing the psychometric properties of balanced scales and use informed rotation approaches when suspecting that an additional factor is an ARS factor.</p>","PeriodicalId":11502,"journal":{"name":"Educational and Psychological Measurement","volume":null,"pages":null},"PeriodicalIF":2.7,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10177316/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9846850","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Scoring Graphical Responses in TIMSS 2019 Using Artificial Neural Networks. 使用人工神经网络对 2019 年 TIMSS 中的图形反应进行评分。
IF 2.7 3区 心理学 Q1 Social Sciences Pub Date : 2023-06-01 Epub Date: 2022-05-23 DOI: 10.1177/00131644221098021
Matthias von Davier, Lillian Tyack, Lale Khorramdel

Automated scoring of free drawings or images as responses has yet to be used in large-scale assessments of student achievement. In this study, we propose artificial neural networks to classify these types of graphical responses from a TIMSS 2019 item. We are comparing classification accuracy of convolutional and feed-forward approaches. Our results show that convolutional neural networks (CNNs) outperform feed-forward neural networks in both loss and accuracy. The CNN models classified up to 97.53% of the image responses into the appropriate scoring category, which is comparable to, if not more accurate, than typical human raters. These findings were further strengthened by the observation that the most accurate CNN models correctly classified some image responses that had been incorrectly scored by the human raters. As an additional innovation, we outline a method to select human-rated responses for the training sample based on an application of the expected response function derived from item response theory. This paper argues that CNN-based automated scoring of image responses is a highly accurate procedure that could potentially replace the workload and cost of second human raters for international large-scale assessments (ILSAs), while improving the validity and comparability of scoring complex constructed-response items.

在大规模的学生成绩评估中,尚未使用自由绘画或图像作为回答的自动评分。在本研究中,我们建议使用人工神经网络对 TIMSS 2019 项目中的这类图形回答进行分类。我们比较了卷积方法和前馈方法的分类准确性。我们的结果表明,卷积神经网络(CNN)在损失和准确性方面都优于前馈神经网络。卷积神经网络模型可将高达 97.53% 的图像响应分类到相应的评分类别中,其准确性甚至可媲美典型的人类评分员。通过观察发现,最准确的 CNN 模型能够正确地将一些被人类评分员错误评分的图像响应进行分类,从而进一步证实了这些发现。作为一项额外的创新,我们概述了一种方法,该方法基于从项目反应理论中得出的预期反应函数的应用,为训练样本选择人类评分的反应。本文认为,基于 CNN 的图像回答自动评分是一种高度精确的程序,有可能取代国际大规模测评(ILSA)中第二名人工评分员的工作量和成本,同时提高复杂构建回答项目评分的有效性和可比性。
{"title":"Scoring Graphical Responses in TIMSS 2019 Using Artificial Neural Networks.","authors":"Matthias von Davier, Lillian Tyack, Lale Khorramdel","doi":"10.1177/00131644221098021","DOIUrl":"10.1177/00131644221098021","url":null,"abstract":"<p><p>Automated scoring of free drawings or images as responses has yet to be used in large-scale assessments of student achievement. In this study, we propose artificial neural networks to classify these types of graphical responses from a TIMSS 2019 item. We are comparing classification accuracy of convolutional and feed-forward approaches. Our results show that convolutional neural networks (CNNs) outperform feed-forward neural networks in both loss and accuracy. The CNN models classified up to 97.53% of the image responses into the appropriate scoring category, which is comparable to, if not more accurate, than typical human raters. These findings were further strengthened by the observation that the most accurate CNN models correctly classified some image responses that had been incorrectly scored by the human raters. As an additional innovation, we outline a method to select human-rated responses for the training sample based on an application of the expected response function derived from item response theory. This paper argues that CNN-based automated scoring of image responses is a highly accurate procedure that could potentially replace the workload and cost of second human raters for international large-scale assessments (ILSAs), while improving the validity and comparability of scoring complex constructed-response items.</p>","PeriodicalId":11502,"journal":{"name":"Educational and Psychological Measurement","volume":null,"pages":null},"PeriodicalIF":2.7,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10177318/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9475856","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Assessing Dimensionality of IRT Models Using Traditional and Revised Parallel Analyses. 使用传统和修订的平行分析法评估 IRT 模型的维度。
IF 2.7 3区 心理学 Q1 Social Sciences Pub Date : 2023-06-01 Epub Date: 2022-07-21 DOI: 10.1177/00131644221111838
Wenjing Guo, Youn-Jeng Choi

Determining the number of dimensions is extremely important in applying item response theory (IRT) models to data. Traditional and revised parallel analyses have been proposed within the factor analysis framework, and both have shown some promise in assessing dimensionality. However, their performance in the IRT framework has not been systematically investigated. Therefore, we evaluated the accuracy of traditional and revised parallel analyses for determining the number of underlying dimensions in the IRT framework by conducting simulation studies. Six data generation factors were manipulated: number of observations, test length, type of generation models, number of dimensions, correlations between dimensions, and item discrimination. Results indicated that (a) when the generated IRT model is unidimensional, across all simulation conditions, traditional parallel analysis using principal component analysis and tetrachoric correlation performs best; (b) when the generated IRT model is multidimensional, traditional parallel analysis using principal component analysis and tetrachoric correlation yields the highest proportion of accurately identified underlying dimensions across all factors, except when the correlation between dimensions is 0.8 or the item discrimination is low; and (c) under a few combinations of simulated factors, none of the eight methods performed well (e.g., when the generation model is three-dimensional 3PL, the item discrimination is low, and the correlation between dimensions is 0.8).

在将项目反应理论(IRT)模型应用于数据时,确定维度的数量极为重要。在因子分析框架内提出了传统的平行分析法和修订的平行分析法,这两种方法在评估维度方面都显示出了一定的前景。但是,它们在 IRT 框架中的表现还没有得到系统的研究。因此,我们通过进行模拟研究,评估了传统并行分析法和修订并行分析法在 IRT 框架中确定基本维度数量的准确性。我们操纵了六个数据生成因素:观察数、测验长度、生成模型类型、维度数、维度间相关性和项目区分度。结果表明:(a) 当生成的 IRT 模型为单维模型时,在所有模拟条件下,使用主成分分析和四元相关的传统平行分析法表现最佳;(b) 当生成的 IRT 模型为多维模型时,使用主成分分析和四元相关的传统平行分析法在所有因素中准确识别出的基本维度比例最高,但维度间相关性为 0.8 或项目区分度较低时除外;(c) 在少数模拟因素组合下,八种方法均表现不佳(例如,当生成模型为三维模型时,使用主成分分析和四元相关的传统平行分析法表现最佳;当生成模型为四维模型时,使用主成分分析和四元相关的传统平行分析法表现最佳;当生成模型为五维模型时,使用主成分分析和四元相关的传统平行分析法表现最佳、当生成模型为三维 3PL 时,项目区分度低,维度间相关性为 0.8)。
{"title":"Assessing Dimensionality of IRT Models Using Traditional and Revised Parallel Analyses.","authors":"Wenjing Guo, Youn-Jeng Choi","doi":"10.1177/00131644221111838","DOIUrl":"10.1177/00131644221111838","url":null,"abstract":"<p><p>Determining the number of dimensions is extremely important in applying item response theory (IRT) models to data. Traditional and revised parallel analyses have been proposed within the factor analysis framework, and both have shown some promise in assessing dimensionality. However, their performance in the IRT framework has not been systematically investigated. Therefore, we evaluated the accuracy of traditional and revised parallel analyses for determining the number of underlying dimensions in the IRT framework by conducting simulation studies. Six data generation factors were manipulated: number of observations, test length, type of generation models, number of dimensions, correlations between dimensions, and item discrimination. Results indicated that (a) when the generated IRT model is unidimensional, across all simulation conditions, traditional parallel analysis using principal component analysis and tetrachoric correlation performs best; (b) when the generated IRT model is multidimensional, traditional parallel analysis using principal component analysis and tetrachoric correlation yields the highest proportion of accurately identified underlying dimensions across all factors, except when the correlation between dimensions is 0.8 or the item discrimination is low; and (c) under a few combinations of simulated factors, none of the eight methods performed well (e.g., when the generation model is three-dimensional 3PL, the item discrimination is low, and the correlation between dimensions is 0.8).</p>","PeriodicalId":11502,"journal":{"name":"Educational and Psychological Measurement","volume":null,"pages":null},"PeriodicalIF":2.7,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10177320/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9475858","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Changes in the Speed-Ability Relation Through Different Treatments of Rapid Guessing. 快速猜测的不同处理方式对速度-能力关系的影响。
IF 2.1 3区 心理学 Q2 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS Pub Date : 2023-06-01 Epub Date: 2022-07-11 DOI: 10.1177/00131644221109490
Tobias Deribo, Frank Goldhammer, Ulf Kroehne

As researchers in the social sciences, we are often interested in studying not directly observable constructs through assessments and questionnaires. But even in a well-designed and well-implemented study, rapid-guessing behavior may occur. Under rapid-guessing behavior, a task is skimmed shortly but not read and engaged with in-depth. Hence, a response given under rapid-guessing behavior does bias constructs and relations of interest. Bias also appears reasonable for latent speed estimates obtained under rapid-guessing behavior, as well as the identified relation between speed and ability. This bias seems especially problematic considering that the relation between speed and ability has been shown to be able to improve precision in ability estimation. For this reason, we investigate if and how responses and response times obtained under rapid-guessing behavior affect the identified speed-ability relation and the precision of ability estimates in a joint model of speed and ability. Therefore, the study presents an empirical application that highlights a specific methodological problem resulting from rapid-guessing behavior. Here, we could show that different (non-)treatments of rapid guessing can lead to different conclusions about the underlying speed-ability relation. Furthermore, different rapid-guessing treatments led to wildly different conclusions about gains in precision through joint modeling. The results show the importance of taking rapid guessing into account when the psychometric use of response times is of interest.

作为社会科学领域的研究人员,我们经常有兴趣通过评估和问卷调查来研究无法直接观察到的构造。但是,即使是在精心设计和实施的研究中,也可能会出现快速猜测行为。在快速猜测行为下,任务会被快速浏览,而不是深入阅读和参与。因此,在快速猜测行为下做出的回答会对相关的构造和关系产生偏差。对于在快速猜测行为下获得的潜在速度估计值以及速度与能力之间的关系,偏差似乎也是合理的。考虑到速度与能力之间的关系已被证明能够提高能力估计的精确度,这种偏差似乎尤其成问题。为此,我们研究了在快速猜测行为下获得的反应和反应时间是否以及如何影响速度与能力之间的关系以及速度与能力联合模型中能力估计的精确度。因此,本研究提出了一个实证应用,强调了快速猜测行为导致的特定方法问题。在这里,我们可以证明,对快速猜测的不同(非)处理会导致对基本速度-能力关系的不同结论。此外,不同的快速猜测处理方法会导致对通过联合建模提高精确度的结论大相径庭。这些结果表明,在对反应时间的心理测量使用感兴趣时,将快速猜测考虑在内非常重要。
{"title":"Changes in the Speed-Ability Relation Through Different Treatments of Rapid Guessing.","authors":"Tobias Deribo, Frank Goldhammer, Ulf Kroehne","doi":"10.1177/00131644221109490","DOIUrl":"10.1177/00131644221109490","url":null,"abstract":"<p><p>As researchers in the social sciences, we are often interested in studying not directly observable constructs through assessments and questionnaires. But even in a well-designed and well-implemented study, rapid-guessing behavior may occur. Under rapid-guessing behavior, a task is skimmed shortly but not read and engaged with in-depth. Hence, a response given under rapid-guessing behavior does bias constructs and relations of interest. Bias also appears reasonable for latent speed estimates obtained under rapid-guessing behavior, as well as the identified relation between speed and ability. This bias seems especially problematic considering that the relation between speed and ability has been shown to be able to improve precision in ability estimation. For this reason, we investigate if and how responses and response times obtained under rapid-guessing behavior affect the identified speed-ability relation and the precision of ability estimates in a joint model of speed and ability. Therefore, the study presents an empirical application that highlights a specific methodological problem resulting from rapid-guessing behavior. Here, we could show that different (non-)treatments of rapid guessing can lead to different conclusions about the underlying speed-ability relation. Furthermore, different rapid-guessing treatments led to wildly different conclusions about gains in precision through joint modeling. The results show the importance of taking rapid guessing into account when the psychometric use of response times is of interest.</p>","PeriodicalId":11502,"journal":{"name":"Educational and Psychological Measurement","volume":null,"pages":null},"PeriodicalIF":2.1,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10177319/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9846842","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The Impact of Sample Size and Various Other Factors on Estimation of Dichotomous Mixture IRT Models. 样本大小和其他各种因素对二分法混合 IRT 模型估计的影响。
IF 2.7 3区 心理学 Q1 Social Sciences Pub Date : 2023-06-01 Epub Date: 2022-05-19 DOI: 10.1177/00131644221094325
Sedat Sen, Allan S Cohen

The purpose of this study was to examine the effects of different data conditions on item parameter recovery and classification accuracy of three dichotomous mixture item response theory (IRT) models: the Mix1PL, Mix2PL, and Mix3PL. Manipulated factors in the simulation included the sample size (11 different sample sizes from 100 to 5000), test length (10, 30, and 50), number of classes (2 and 3), the degree of latent class separation (normal/no separation, small, medium, and large), and class sizes (equal vs. nonequal). Effects were assessed using root mean square error (RMSE) and classification accuracy percentage computed between true parameters and estimated parameters. The results of this simulation study showed that more precise estimates of item parameters were obtained with larger sample sizes and longer test lengths. Recovery of item parameters decreased as the number of classes increased with the decrease in sample size. Recovery of classification accuracy for the conditions with two-class solutions was also better than that of three-class solutions. Results of both item parameter estimates and classification accuracy differed by model type. More complex models and models with larger class separations produced less accurate results. The effect of the mixture proportions also differentially affected RMSE and classification accuracy results. Groups of equal size produced more precise item parameter estimates, but the reverse was the case for classification accuracy results. Results suggested that dichotomous mixture IRT models required more than 2,000 examinees to be able to obtain stable results as even shorter tests required such large sample sizes for more precise estimates. This number increased as the number of latent classes, the degree of separation, and model complexity increased.

本研究的目的是检验不同数据条件对三种二分混合项目反应理论(IRT)模型(Mix1PL、Mix2PL 和 Mix3PL)的项目参数恢复和分类准确性的影响。模拟中的操纵因素包括样本量(从 100 到 5000 的 11 种不同样本量)、测试长度(10、30 和 50)、类数(2 和 3)、潜类分离程度(正常/不分离、小、中、大)和类大小(相等与不相等)。通过计算真实参数和估计参数之间的均方根误差(RMSE)和分类准确率百分比来评估效果。模拟研究结果表明,样本量越大、测试时间越长,项目参数的估计值越精确。随着样本量的减少,类别数增加,项目参数的恢复率下降。两类方案条件下的分类准确率恢复也优于三类方案条件下的分类准确率恢复。项目参数估计和分类准确率的结果因模型类型而异。更复杂的模型和类别分离更大的模型产生的结果准确性更低。混合比例的影响也会对均方根误差和分类精确度结果产生不同的影响。大小相等的组产生的项目参数估计更精确,但分类精确度结果则相反。结果表明,二分法混合 IRT 模型需要超过 2,000 名受试者才能获得稳定的结果,因为即使是较短的测验也需要如此大的样本量才能获得更精确的估计值。随着潜类数量、分离程度和模型复杂性的增加,这一数字也在增加。
{"title":"The Impact of Sample Size and Various Other Factors on Estimation of Dichotomous Mixture IRT Models.","authors":"Sedat Sen, Allan S Cohen","doi":"10.1177/00131644221094325","DOIUrl":"10.1177/00131644221094325","url":null,"abstract":"<p><p>The purpose of this study was to examine the effects of different data conditions on item parameter recovery and classification accuracy of three dichotomous mixture item response theory (IRT) models: the Mix1PL, Mix2PL, and Mix3PL. Manipulated factors in the simulation included the sample size (11 different sample sizes from 100 to 5000), test length (10, 30, and 50), number of classes (2 and 3), the degree of latent class separation (normal/no separation, small, medium, and large), and class sizes (equal vs. nonequal). Effects were assessed using root mean square error (RMSE) and classification accuracy percentage computed between true parameters and estimated parameters. The results of this simulation study showed that more precise estimates of item parameters were obtained with larger sample sizes and longer test lengths. Recovery of item parameters decreased as the number of classes increased with the decrease in sample size. Recovery of classification accuracy for the conditions with two-class solutions was also better than that of three-class solutions. Results of both item parameter estimates and classification accuracy differed by model type. More complex models and models with larger class separations produced less accurate results. The effect of the mixture proportions also differentially affected RMSE and classification accuracy results. Groups of equal size produced more precise item parameter estimates, but the reverse was the case for classification accuracy results. Results suggested that dichotomous mixture IRT models required more than 2,000 examinees to be able to obtain stable results as even shorter tests required such large sample sizes for more precise estimates. This number increased as the number of latent classes, the degree of separation, and model complexity increased.</p>","PeriodicalId":11502,"journal":{"name":"Educational and Psychological Measurement","volume":null,"pages":null},"PeriodicalIF":2.7,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10177317/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9475859","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Small Sample Correction for Factor Score Regression. 因子得分回归的小样本校正。
IF 2.7 3区 心理学 Q1 Social Sciences Pub Date : 2023-06-01 Epub Date: 2022-07-02 DOI: 10.1177/00131644221105505
Jasper Bogaert, Wen Wei Loh, Yves Rosseel

Factor score regression (FSR) is widely used as a convenient alternative to traditional structural equation modeling (SEM) for assessing structural relations between latent variables. But when latent variables are simply replaced by factor scores, biases in the structural parameter estimates often have to be corrected, due to the measurement error in the factor scores. The method of Croon (MOC) is a well-known bias correction technique. However, its standard implementation can render poor quality estimates in small samples (e.g. less than 100). This article aims to develop a small sample correction (SSC) that integrates two different modifications to the standard MOC. We conducted a simulation study to compare the empirical performance of (a) standard SEM, (b) the standard MOC, (c) naive FSR, and (d) the MOC with the proposed SSC. In addition, we assessed the robustness of the performance of the SSC in various models with a different number of predictors and indicators. The results showed that the MOC with the proposed SSC yielded smaller mean squared errors than SEM and the standard MOC in small samples and performed similarly to naive FSR. However, naive FSR yielded more biased estimates than the proposed MOC with SSC, by failing to account for measurement error in the factor scores.

因子得分回归(FSR)作为传统结构方程模型(SEM)的一种便捷替代方法,被广泛用于评估潜变量之间的结构关系。但是,当潜在变量被简单地替换为因子得分时,由于因子得分的测量误差,结构参数估计的偏差往往需要修正。克罗恩方法(MOC)是一种著名的偏差校正技术。然而,在小样本(如少于 100 个样本)情况下,其标准实施可能会导致估算质量低下。本文旨在开发一种小样本校正方法(SSC),它整合了对标准 MOC 的两种不同修正。我们进行了一项模拟研究,比较了 (a) 标准 SEM、(b) 标准 MOC、(c) 天真 FSR 和 (d) MOC 与建议的 SSC 的经验性能。此外,我们还评估了 SSC 在具有不同数量预测因子和指标的各种模型中的稳健性。结果表明,与 SEM 和标准 MOC 相比,在小样本中,建议 SSC 的 MOC 产生的均方误差更小,性能与天真 FSR 相似。然而,由于未能考虑因子得分的测量误差,天真 FSR 比拟议的带 SSC 的 MOC 产生了更多偏差估计。
{"title":"A Small Sample Correction for Factor Score Regression.","authors":"Jasper Bogaert, Wen Wei Loh, Yves Rosseel","doi":"10.1177/00131644221105505","DOIUrl":"10.1177/00131644221105505","url":null,"abstract":"<p><p>Factor score regression (FSR) is widely used as a convenient alternative to traditional structural equation modeling (SEM) for assessing structural relations between latent variables. But when latent variables are simply replaced by factor scores, biases in the structural parameter estimates often have to be corrected, due to the measurement error in the factor scores. The method of Croon (MOC) is a well-known bias correction technique. However, its standard implementation can render poor quality estimates in small samples (e.g. less than 100). This article aims to develop a small sample correction (SSC) that integrates two different modifications to the standard MOC. We conducted a simulation study to compare the empirical performance of (a) standard SEM, (b) the standard MOC, (c) naive FSR, and (d) the MOC with the proposed SSC. In addition, we assessed the robustness of the performance of the SSC in various models with a different number of predictors and indicators. The results showed that the MOC with the proposed SSC yielded smaller mean squared errors than SEM and the standard MOC in small samples and performed similarly to naive FSR. However, naive FSR yielded more biased estimates than the proposed MOC with SSC, by failing to account for measurement error in the factor scores.</p>","PeriodicalId":11502,"journal":{"name":"Educational and Psychological Measurement","volume":null,"pages":null},"PeriodicalIF":2.7,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10177321/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10349847","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Educational and Psychological Measurement
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1