首页 > 最新文献

Educational and Psychological Measurement最新文献

英文 中文
Evaluation of Item Fit With Output From the EM Algorithm: RMSD Index Based on Posterior Expectations. EM算法输出的项目拟合评价:基于后验期望的RMSD指数。
IF 2.3 3区 心理学 Q2 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-10-04 DOI: 10.1177/00131644251369532
Yun-Kyung Kim, Li Cai, YoungKoung Kim

In item response theory modeling, item fit analysis using posterior expectations, otherwise known as pseudocounts, has many advantages. They are readily obtained from the E-step output of the Bock-Aitkin Expectation-Maximization (EM) algorithm and continue to function as a basis of evaluating model fit, even when missing data are present. This paper aimed to improve the interpretability of the root mean squared deviation (RMSD) index based on posterior expectations. In Study 1, we assessed its performance using two approaches. First, we employed the poor person's posterior predictive model checking (PP-PPMC) to compute their significance levels. The resulting Type I error was generally controlled below the nominal level, but power noticeably declined with smaller sample sizes and shorter test lengths. Second, we used receiver operating characteristic (ROC) curve analysis (±) to empirically determine the reference values (cutoff thresholds) that achieve an optimal balance between false-positive and true-positive rates. Importantly, we identified optimal reference values for each combination of sample size and test length in the simulation conditions. The cutoff threshold approach outperformed the PP-PPMC approach with greater gains in true-positive rates than losses from the inflated false-positive rates. In Study 2, we extended the cutoff threshold approach to conditions with larger sample sizes and longer test lengths. Moreover, we evaluated the performance of the optimized cutoff thresholds under varying levels of data missingness. Finally, we employed response surface analysis (±) to develop a prediction model that generalizes the way the reference values vary with sample size and test length. Overall, this study demonstrates the application of the PP-PPMC for item fit diagnostics and implements a practical frequentist approach to empirically derive reference values. Using our prediction model, practitioners can compute the reference values of RMSD that are tailored to their dataset's sample size and test length.

在项目反应理论建模中,项目拟合分析使用后验期望,或称为伪计数,有许多优点。它们很容易从Bock-Aitkin期望最大化(EM)算法的e步输出中获得,并且即使存在缺失数据,也可以继续作为评估模型拟合的基础。本文旨在提高基于后验期望的均方根偏差(RMSD)指数的可解释性。在研究1中,我们使用两种方法评估其性能。首先,我们采用穷人的后验预测模型检验(PP-PPMC)来计算其显著性水平。由此产生的I型误差通常被控制在标称水平以下,但随着样本量的减少和测试长度的缩短,功率明显下降。其次,我们使用受试者工作特征(ROC)曲线分析(±)来经验确定在假阳性率和真阳性率之间实现最佳平衡的参考值(截止阈值)。重要的是,我们确定了模拟条件下每种样本量和测试长度组合的最佳参考值。截止阈值法比PP-PPMC法表现更好,在真阳性率方面的收益大于假阳性率膨胀带来的损失。在研究2中,我们将截止阈值方法扩展到样本量更大、测试长度更长的条件下。此外,我们评估了优化的截止阈值在不同数据缺失水平下的性能。最后,我们采用响应面分析(±)建立了一个预测模型,该模型概括了参考值随样本量和试验长度的变化方式。总体而言,本研究展示了PP-PPMC在项目拟合诊断中的应用,并实现了一种实用的频率学方法来经验推导参考值。使用我们的预测模型,从业者可以根据他们的数据集的样本大小和测试长度来计算RMSD的参考值。
{"title":"Evaluation of Item Fit With Output From the EM Algorithm: RMSD Index Based on Posterior Expectations.","authors":"Yun-Kyung Kim, Li Cai, YoungKoung Kim","doi":"10.1177/00131644251369532","DOIUrl":"10.1177/00131644251369532","url":null,"abstract":"<p><p>In item response theory modeling, item fit analysis using posterior expectations, otherwise known as pseudocounts, has many advantages. They are readily obtained from the E-step output of the Bock-Aitkin Expectation-Maximization (EM) algorithm and continue to function as a basis of evaluating model fit, even when missing data are present. This paper aimed to improve the interpretability of the root mean squared deviation (RMSD) index based on posterior expectations. In Study 1, we assessed its performance using two approaches. First, we employed the poor person's posterior predictive model checking (PP-PPMC) to compute their significance levels. The resulting Type I error was generally controlled below the nominal level, but power noticeably declined with smaller sample sizes and shorter test lengths. Second, we used receiver operating characteristic (ROC) curve analysis (±) to empirically determine the reference values (cutoff thresholds) that achieve an optimal balance between false-positive and true-positive rates. Importantly, we identified optimal reference values for each combination of sample size and test length in the simulation conditions. The cutoff threshold approach outperformed the PP-PPMC approach with greater gains in true-positive rates than losses from the inflated false-positive rates. In Study 2, we extended the cutoff threshold approach to conditions with larger sample sizes and longer test lengths. Moreover, we evaluated the performance of the optimized cutoff thresholds under varying levels of data missingness. Finally, we employed response surface analysis (±) to develop a prediction model that generalizes the way the reference values vary with sample size and test length. Overall, this study demonstrates the application of the PP-PPMC for item fit diagnostics and implements a practical frequentist approach to empirically derive reference values. Using our prediction model, practitioners can compute the reference values of RMSD that are tailored to their dataset's sample size and test length.</p>","PeriodicalId":11502,"journal":{"name":"Educational and Psychological Measurement","volume":" ","pages":"00131644251369532"},"PeriodicalIF":2.3,"publicationDate":"2025-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12496452/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145238234","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Impacts of DIF Item Balance and Effect Size Incorporation With the Rasch Tree. Rasch树对DIF项目平衡和效应量合并的影响。
IF 2.3 3区 心理学 Q2 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-09-24 DOI: 10.1177/00131644251370605
Nana Amma Berko Asamoah, Ronna C Turner, Wen-Juo Lo, Brandon L Crawford, Kristen N Jozkowski

Ensuring fairness in educational and psychological assessments is critical, particularly in detecting differential item functioning (DIF), where items perform differently across subgroups. The Rasch tree method, a model-based recursive partitioning approach, is an innovative and flexible DIF detection tool that does not require the pre-specification of focal and reference groups. However, research systematically examining its performance under realistic measurement conditions, such as when multiple DIF items do not consistently favor one subgroup, is limited. This study builds on prior research, evaluating the Rasch tree method's ability to detect DIF by investigating the impact of DIF balance, along with other key factors such as DIF magnitude, sample size, test length, and contamination levels. Additionally, we incorporate the Educational Testing Service effect size heuristic as a criterion to compare the DIF detection rate performance with only statistical significance. Results indicate that the Rasch tree has better true DIF detection rates under balanced DIF conditions and large DIF magnitudes. However, its accuracy declines when DIF is unbalanced and the percentage of DIF contamination increases. The use of an effect size reduces the detection of negligible DIF. Caution is recommended with smaller samples, where detection rates are the lowest, especially for larger DIF magnitudes and increased DIF contamination percentages in unbalanced conditions. The study highlights the strengths and limitations of the Rasch tree method under a variety of conditions, underscores the importance of the impact of DIF group imbalance, and provides recommendations for optimizing DIF detection in practical assessment scenarios.

确保教育和心理评估的公平性至关重要,特别是在检测差异项目功能(DIF)方面,其中项目在子群体中的表现不同。Rasch树方法是一种基于模型的递归划分方法,是一种创新和灵活的DIF检测工具,不需要预先指定焦点和参考组。然而,在现实的测量条件下系统地检查其性能的研究是有限的,例如当多个DIF项目不始终有利于一个子组时。本研究建立在先前的研究基础上,通过调查DIF平衡的影响,以及其他关键因素(如DIF大小、样本量、测试长度和污染水平),评估Rasch树方法检测DIF的能力。此外,我们将教育测试服务效应大小启发式作为标准来比较仅具有统计显著性的DIF检出率表现。结果表明,在平衡DIF条件和大DIF值下,Rasch树具有较好的真DIF检出率。然而,当DIF不平衡和DIF污染百分比增加时,其精度下降。效应量的使用减少了对可忽略的DIF的检测。对于较小的样品,检出率最低,特别是对于较大的DIF量级和不平衡条件下增加的DIF污染百分比,建议谨慎使用。本研究强调了Rasch树方法在各种条件下的优势和局限性,强调了DIF组不平衡影响的重要性,并为在实际评估场景中优化DIF检测提供了建议。
{"title":"Impacts of DIF Item Balance and Effect Size Incorporation With the Rasch Tree.","authors":"Nana Amma Berko Asamoah, Ronna C Turner, Wen-Juo Lo, Brandon L Crawford, Kristen N Jozkowski","doi":"10.1177/00131644251370605","DOIUrl":"10.1177/00131644251370605","url":null,"abstract":"<p><p>Ensuring fairness in educational and psychological assessments is critical, particularly in detecting differential item functioning (DIF), where items perform differently across subgroups. The Rasch tree method, a model-based recursive partitioning approach, is an innovative and flexible DIF detection tool that does not require the pre-specification of focal and reference groups. However, research systematically examining its performance under realistic measurement conditions, such as when multiple DIF items do not consistently favor one subgroup, is limited. This study builds on prior research, evaluating the Rasch tree method's ability to detect DIF by investigating the impact of DIF balance, along with other key factors such as DIF magnitude, sample size, test length, and contamination levels. Additionally, we incorporate the Educational Testing Service effect size heuristic as a criterion to compare the DIF detection rate performance with only statistical significance. Results indicate that the Rasch tree has better true DIF detection rates under balanced DIF conditions and large DIF magnitudes. However, its accuracy declines when DIF is unbalanced and the percentage of DIF contamination increases. The use of an effect size reduces the detection of negligible DIF. Caution is recommended with smaller samples, where detection rates are the lowest, especially for larger DIF magnitudes and increased DIF contamination percentages in unbalanced conditions. The study highlights the strengths and limitations of the Rasch tree method under a variety of conditions, underscores the importance of the impact of DIF group imbalance, and provides recommendations for optimizing DIF detection in practical assessment scenarios.</p>","PeriodicalId":11502,"journal":{"name":"Educational and Psychological Measurement","volume":" ","pages":"00131644251370605"},"PeriodicalIF":2.3,"publicationDate":"2025-09-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12463886/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145184997","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Using Item Scores and Response Times to Detect Item Compromise in Computerized Adaptive Testing. 在计算机化自适应测试中使用项目分数和反应时间来检测项目妥协。
IF 2.3 3区 心理学 Q2 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-09-14 DOI: 10.1177/00131644251368335
Chansoon Lee, Kylie Gorney, Jianshen Chen

Sequential procedures have been shown to be effective methods for real-time detection of compromised items in computerized adaptive testing. In this study, we propose three item response theory-based sequential procedures that involve the use of item scores and response times (RTs). The first procedure requires that either the score-based statistic or the RT-based statistic be extreme, the second procedure requires that both the score-based statistic and the RT-based statistic be extreme, and the third procedure requires that a combined score and RT-based statistic be extreme. Results suggest that the third procedure is the most promising, providing a reasonable balance between the false-positive rate and the true-positive rate while also producing relatively short lag times across a wide range of simulation conditions.

顺序程序已被证明是计算机化自适应测试中实时检测受损项目的有效方法。在本研究中,我们提出了三个基于项目反应理论的顺序程序,涉及使用项目得分和反应时间(RTs)。第一个过程要求基于分数的统计数据或基于rt的统计数据中的一个是极值,第二个过程要求基于分数的统计数据和基于rt的统计数据都是极值,第三个过程要求分数和基于rt的统计数据的组合是极值。结果表明,第三种方法最有前途,在假阳性率和真阳性率之间提供了合理的平衡,同时在广泛的模拟条件下产生相对较短的滞后时间。
{"title":"Using Item Scores and Response Times to Detect Item Compromise in Computerized Adaptive Testing.","authors":"Chansoon Lee, Kylie Gorney, Jianshen Chen","doi":"10.1177/00131644251368335","DOIUrl":"10.1177/00131644251368335","url":null,"abstract":"<p><p>Sequential procedures have been shown to be effective methods for real-time detection of compromised items in computerized adaptive testing. In this study, we propose three item response theory-based sequential procedures that involve the use of item scores and response times (RTs). The first procedure requires that either the score-based statistic or the RT-based statistic be extreme, the second procedure requires that both the score-based statistic and the RT-based statistic be extreme, and the third procedure requires that a combined score and RT-based statistic be extreme. Results suggest that the third procedure is the most promising, providing a reasonable balance between the false-positive rate and the true-positive rate while also producing relatively short lag times across a wide range of simulation conditions.</p>","PeriodicalId":11502,"journal":{"name":"Educational and Psychological Measurement","volume":" ","pages":"00131644251368335"},"PeriodicalIF":2.3,"publicationDate":"2025-09-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12433998/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145074512","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Dimensionality Assessment in Forced-Choice Questionnaires: First Steps Toward an Exploratory Framework. 强迫选择问卷的维度评估:迈向探索性框架的第一步。
IF 2.3 3区 心理学 Q2 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-09-08 DOI: 10.1177/00131644251358226
Diego F Graña, Rodrigo S Kreitchmann, Miguel A Sorrel, Luis Eduardo Garrido, Francisco J Abad

Forced-choice (FC) questionnaires have gained increasing attention as a strategy to reduce social desirability in self-reports, supported by advancements in confirmatory models that address the ipsativity of FC test scores. However, these models assume a known dimensionality and structure, which can be overly restrictive or fail to fit the data adequately. Consequently, exploratory models can be required, with accurate dimensionality assessment as a critical first step. FC questionnaires also pose unique challenges for dimensionality assessment, due to their inherently complex multidimensional structures. Despite this, no prior studies have systematically evaluated dimensionality assessment methods for FC data. To fill this gap, the present study examines five commonly used methods: the Kaiser Criterion, Empirical Kaiser Criterion, Parallel Analysis (PA), Hull Method, and Exploratory Graph Analysis. A Monte Carlo simulation study was conducted, manipulating key design features of FC questionnaires, such as the number of dimensions, items per dimension, response formats (e.g., binary vs. graded), and block composition (e.g., inclusion of heteropolar and unidimensional blocks), as well as factor loadings, inter-factor correlations, and sample size. Results showed that the Maximal Kaiser Criterion and PA methods outperformed the others, achieving higher accuracy and lower bias. Performance improved particularly when heteropolar or unidimensional blocks were included or when the questionnaire length increased. These findings emphasize the importance of thoughtful FC test design and provide practical recommendations for improving dimensionality assessment in this format.

强迫选择(FC)问卷作为一种降低自我报告中的社会期望的策略,得到了越来越多的关注,并得到了验证性模型的支持,该模型解决了FC测试分数的积极作用。然而,这些模型假设已知的维度和结构,这可能过于严格或不能充分拟合数据。因此,需要探索性模型,准确的维度评估是关键的第一步。FC问卷由于其内在复杂的多维结构,也给维度评估带来了独特的挑战。尽管如此,之前还没有研究系统地评估FC数据的维数评估方法。为了填补这一空白,本研究考察了五种常用的方法:凯撒标准、经验凯撒标准、平行分析(PA)、赫尔法和探索性图分析。通过蒙特卡罗模拟研究,对FC问卷的主要设计特征进行了操作,如维度数、每个维度的条目数、回答格式(如二元与分级)、块组成(如异极性和单维块的包含)、因子负荷、因子间相关性和样本量等。结果表明,最大Kaiser准则和PA方法均优于其他方法,具有较高的准确率和较低的偏差。当包含异极性或单向度块或问卷长度增加时,表现得到改善。这些发现强调了深思熟虑的FC测试设计的重要性,并为改进这种形式的维度评估提供了实用的建议。
{"title":"Dimensionality Assessment in Forced-Choice Questionnaires: First Steps Toward an Exploratory Framework.","authors":"Diego F Graña, Rodrigo S Kreitchmann, Miguel A Sorrel, Luis Eduardo Garrido, Francisco J Abad","doi":"10.1177/00131644251358226","DOIUrl":"10.1177/00131644251358226","url":null,"abstract":"<p><p>Forced-choice (FC) questionnaires have gained increasing attention as a strategy to reduce social desirability in self-reports, supported by advancements in confirmatory models that address the ipsativity of FC test scores. However, these models assume a known dimensionality and structure, which can be overly restrictive or fail to fit the data adequately. Consequently, exploratory models can be required, with accurate dimensionality assessment as a critical first step. FC questionnaires also pose unique challenges for dimensionality assessment, due to their inherently complex multidimensional structures. Despite this, no prior studies have systematically evaluated dimensionality assessment methods for FC data. To fill this gap, the present study examines five commonly used methods: the Kaiser Criterion, Empirical Kaiser Criterion, Parallel Analysis (PA), Hull Method, and Exploratory Graph Analysis. A Monte Carlo simulation study was conducted, manipulating key design features of FC questionnaires, such as the number of dimensions, items per dimension, response formats (e.g., binary vs. graded), and block composition (e.g., inclusion of heteropolar and unidimensional blocks), as well as factor loadings, inter-factor correlations, and sample size. Results showed that the Maximal Kaiser Criterion and PA methods outperformed the others, achieving higher accuracy and lower bias. Performance improved particularly when heteropolar or unidimensional blocks were included or when the questionnaire length increased. These findings emphasize the importance of thoughtful FC test design and provide practical recommendations for improving dimensionality assessment in this format.</p>","PeriodicalId":11502,"journal":{"name":"Educational and Psychological Measurement","volume":" ","pages":"00131644251358226"},"PeriodicalIF":2.3,"publicationDate":"2025-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12420653/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145039408","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Reducing Calibration Bias for Person Fit Assessment by Mixture Model Expansion. 利用混合模型扩展减少人的适合度评估的校准偏差。
IF 2.3 3区 心理学 Q2 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-09-06 DOI: 10.1177/00131644251364252
Johan Braeken, Saskia van Laar

Measurement appropriateness concerns the question of whether the test or survey scale under consideration can provide a valid measure for a specific individual. An aberrant item response pattern would provide internal counterevidence against using the test/scale for this person, whereas a more typical item response pattern would imply a fit of the measure to the person. Traditional approaches, including the popular Lz person fit statistic, are hampered by their two-stage estimation procedure and the fact that the fit for the person is determined based on the model calibrated on data that include the misfitting persons. This calibration bias creates suboptimal conditions for person fit assessment. Solutions have been sought through the derivation of approximating bias-correction formulas and/or iterative purification procedures. Yet, here we discuss an alternative one-stage solution that involves calibrating a model expansion of the measurement model that includes a mixture component for target aberrant response patterns. A simulation study evaluates the approach under the most unfavorable and least-studied conditions for person fit indices, short polytomous survey scales, similar to those found in large-scale educational assessments such as the Program for International Student Assessment or Trends in Mathematics and Science Study.

测量适当性涉及的问题是,所考虑的测试或调查量表能否为特定个体提供有效的测量。一个异常的项目反应模式会提供内部的反证,反对对这个人使用测试/量表,而一个更典型的项目反应模式会暗示测量方法适合这个人。传统的方法,包括流行的Lz人拟合统计量,由于其两阶段估计过程以及基于包含不拟合人的数据校准的模型来确定人的拟合程度而受到阻碍。这种校准偏差造成了人员适合度评估的次优条件。通过推导近似偏差校正公式和/或迭代净化程序来寻求解决方案。然而,我们在这里讨论一种可选的单阶段解决方案,该解决方案涉及校准测量模型的模型扩展,该模型包含目标异常响应模式的混合组件。一项模拟研究评估了该方法在最不利和研究最少的条件下的个人适合指数,短多分调查量表,类似于国际学生评估计划或数学和科学研究趋势等大规模教育评估中发现的方法。
{"title":"Reducing Calibration Bias for Person Fit Assessment by Mixture Model Expansion.","authors":"Johan Braeken, Saskia van Laar","doi":"10.1177/00131644251364252","DOIUrl":"10.1177/00131644251364252","url":null,"abstract":"<p><p>Measurement appropriateness concerns the question of whether the test or survey scale under consideration can provide a valid measure for a specific individual. An aberrant item response pattern would provide internal counterevidence against using the test/scale for this person, whereas a more typical item response pattern would imply a fit of the measure to the person. Traditional approaches, including the popular Lz person fit statistic, are hampered by their two-stage estimation procedure and the fact that the fit for the person is determined based on the model calibrated on data that include the misfitting persons. This calibration bias creates suboptimal conditions for person fit assessment. Solutions have been sought through the derivation of approximating bias-correction formulas and/or iterative purification procedures. Yet, here we discuss an alternative one-stage solution that involves calibrating a model expansion of the measurement model that includes a mixture component for target aberrant response patterns. A simulation study evaluates the approach under the most unfavorable and least-studied conditions for person fit indices, short polytomous survey scales, similar to those found in large-scale educational assessments such as the Program for International Student Assessment or Trends in Mathematics and Science Study.</p>","PeriodicalId":11502,"journal":{"name":"Educational and Psychological Measurement","volume":" ","pages":"00131644251364252"},"PeriodicalIF":2.3,"publicationDate":"2025-09-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12413990/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145023055","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Proportion Explained Component Variance in Second-Order Scales: A Note on a Latent Variable Modeling Approach. 二阶尺度的比例解释成分方差:对潜在变量建模方法的注解。
IF 2.3 3区 心理学 Q2 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-08-23 DOI: 10.1177/00131644251350536
Tenko Raykov, Christine DiStefano, Yusuf Ransome

A procedure for evaluation of the proportion explained component variance by the underlying trait in behavioral scales with second-order structure is outlined. The resulting index of accounted for variance over all scale components is a useful and informative complement to the conventional omega-hierarchical coefficient as well as the proportion of explained component correlation. A point and interval estimation method is described for the discussed index, which utilizes a confirmatory factor analysis approach within the latent variable modeling methodology. The procedure can be used with widely available software and is illustrated on data.

本文概述了一种在二阶结构的行为量表中,用潜在特征来评估被解释成分方差的比例的方法。由此得出的所有尺度分量的考虑方差指数是对传统的ω -等级系数以及被解释分量相关性比例的有用和信息补充。对所讨论的指标描述了一种点和区间估计方法,该方法利用了潜在变量建模方法中的验证性因子分析方法。该程序可与广泛可用的软件一起使用,并附有数据说明。
{"title":"Proportion Explained Component Variance in Second-Order Scales: A Note on a Latent Variable Modeling Approach.","authors":"Tenko Raykov, Christine DiStefano, Yusuf Ransome","doi":"10.1177/00131644251350536","DOIUrl":"https://doi.org/10.1177/00131644251350536","url":null,"abstract":"<p><p>A procedure for evaluation of the proportion explained component variance by the underlying trait in behavioral scales with second-order structure is outlined. The resulting index of accounted for variance over all scale components is a useful and informative complement to the conventional omega-hierarchical coefficient as well as the proportion of explained component correlation. A point and interval estimation method is described for the discussed index, which utilizes a confirmatory factor analysis approach within the latent variable modeling methodology. The procedure can be used with widely available software and is illustrated on data.</p>","PeriodicalId":11502,"journal":{"name":"Educational and Psychological Measurement","volume":" ","pages":"00131644251350536"},"PeriodicalIF":2.3,"publicationDate":"2025-08-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12374956/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144946890","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
How to Improve the Regression Factor Score Predictor When Individuals Have Different Factor Loadings. 个体具有不同因子负荷时如何改进回归因子评分预测器。
IF 2.3 3区 心理学 Q2 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-08-15 DOI: 10.1177/00131644251347530
André Beauducel, Norbert Hilger, Anneke C Weide

Previous research has shown that ignoring individual differences of factor loadings in conventional factor models may reduce the determinacy of factor score predictors. Therefore, the aim of the present study is to propose a heterogeneous regression factor score predictor (HRFS) with larger determinacy than the conventional regression factor score predictor (RFS) when individuals have different factor loadings. First, a method for the estimation of individual loadings is proposed. The individual loading estimates are used to compute the HRFS. Then, a binomial test for loading heterogeneity of a factor is proposed to compute the HRFS only when the test is significant. Otherwise, the conventional RFS should be used. A simulation study reveals that the HRFS has larger determinacy than the conventional RFS in populations with substantial loading heterogeneity. An empirical example based on subsamples drawn randomly from a large sample of Big Five Markers indicates that the determinacy can be improved for the factor emotional stability when the HRFS is computed.

以往的研究表明,在传统的因子模型中忽略因子负荷的个体差异可能会降低因子得分预测因子的确定性。因此,本研究的目的是在个体具有不同因子负荷时,提出一种比传统回归因子得分预测器具有更大确定性的异质性回归因子得分预测器。首先,提出了一种估计单个负荷的方法。单个负载估计用于计算HRFS。在此基础上,提出了一种因子加载异质性的二项检验,只有当检验显著时才计算HRFS。否则,应使用常规RFS。仿真研究表明,在负荷异质性较大的种群中,HRFS比传统RFS具有更大的确定性。基于从大五大标记的大样本随机抽取的子样本的实证示例表明,在计算HRFS时,情绪稳定性因素的确定性可以得到提高。
{"title":"How to Improve the Regression Factor Score Predictor When Individuals Have Different Factor Loadings.","authors":"André Beauducel, Norbert Hilger, Anneke C Weide","doi":"10.1177/00131644251347530","DOIUrl":"10.1177/00131644251347530","url":null,"abstract":"<p><p>Previous research has shown that ignoring individual differences of factor loadings in conventional factor models may reduce the determinacy of factor score predictors. Therefore, the aim of the present study is to propose a heterogeneous regression factor score predictor (HRFS) with larger determinacy than the conventional regression factor score predictor (RFS) when individuals have different factor loadings. First, a method for the estimation of individual loadings is proposed. The individual loading estimates are used to compute the HRFS. Then, a binomial test for loading heterogeneity of a factor is proposed to compute the HRFS only when the test is significant. Otherwise, the conventional RFS should be used. A simulation study reveals that the HRFS has larger determinacy than the conventional RFS in populations with substantial loading heterogeneity. An empirical example based on subsamples drawn randomly from a large sample of Big Five Markers indicates that the determinacy can be improved for the factor emotional stability when the HRFS is computed.</p>","PeriodicalId":11502,"journal":{"name":"Educational and Psychological Measurement","volume":" ","pages":"00131644251347530"},"PeriodicalIF":2.3,"publicationDate":"2025-08-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12356820/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144872005","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Comparison of LTA Models with and Without Residual Correlation in Estimating Transition Probabilities. 有残差相关和无残差相关的LTA模型估计转移概率的比较。
IF 2.3 3区 心理学 Q2 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-08-14 DOI: 10.1177/00131644251358530
Na Yeon Lee, Sojin Yoon, Sehee Hong

In longitudinal mixture models like latent transition analysis (LTA), identical items are often repeatedly measured across multiple time points to define latent classes and individuals' similar response patterns across multiple time points, which attributes to residual correlations. Therefore, this study hypothesized that an LTA model assuming residual correlations among indicator variables measured repeatedly across multiple time points would provide more accurate estimates of transition probabilities than a traditional LTA model. To test this hypothesis, a Monte Carlo simulation was conducted to generate data both with and without specified residual correlations among the repeatedly measured indicator variables, and the two LTA models-one that accounted for residual correlations and one that did not-were compared. This study included transition probabilities, numbers of indicator variables, sample sizes, and levels of residual correlations as the simulation conditions. The estimation performances were compared based on parameter estimate bias, mean squared error, and coverage. The results demonstrate that LTA with residual correlations outperforms traditional LTA in estimating transition probabilities, and the differences between the two models become prominent when the residual correlation is .3 or higher. This research integrates the characteristics of longitudinal data in an LTA simulation study and suggests an improved version of LTA estimation.

在潜在转移分析(LTA)等纵向混合模型中,经常在多个时间点上重复测量相同的项目,以确定潜在类别和个体在多个时间点上的相似反应模式,这归因于残差相关性。因此,本研究假设假设在多个时间点重复测量的指标变量之间存在残差相关性的LTA模型将比传统的LTA模型提供更准确的转移概率估计。为了验证这一假设,进行了蒙特卡罗模拟,以生成重复测量的指标变量之间有或没有指定残差相关性的数据,并比较了两个LTA模型-一个考虑残差相关性,一个没有。本研究包括过渡概率、指标变量数量、样本量和残差相关水平作为模拟条件。基于参数估计偏差、均方误差和覆盖率对估计性能进行了比较。结果表明,带残差相关的LTA模型在估计转移概率方面优于传统的LTA模型,当残差相关为时,两种模型之间的差异更加突出。3或更高。本研究将纵向数据的特征整合到LTA模拟研究中,并提出了一种改进的LTA估计方法。
{"title":"A Comparison of LTA Models with and Without Residual Correlation in Estimating Transition Probabilities.","authors":"Na Yeon Lee, Sojin Yoon, Sehee Hong","doi":"10.1177/00131644251358530","DOIUrl":"10.1177/00131644251358530","url":null,"abstract":"<p><p>In longitudinal mixture models like latent transition analysis (LTA), identical items are often repeatedly measured across multiple time points to define latent classes and individuals' similar response patterns across multiple time points, which attributes to residual correlations. Therefore, this study hypothesized that an LTA model assuming residual correlations among indicator variables measured repeatedly across multiple time points would provide more accurate estimates of transition probabilities than a traditional LTA model. To test this hypothesis, a Monte Carlo simulation was conducted to generate data both with and without specified residual correlations among the repeatedly measured indicator variables, and the two LTA models-one that accounted for residual correlations and one that did not-were compared. This study included transition probabilities, numbers of indicator variables, sample sizes, and levels of residual correlations as the simulation conditions. The estimation performances were compared based on parameter estimate bias, mean squared error, and coverage. The results demonstrate that LTA with residual correlations outperforms traditional LTA in estimating transition probabilities, and the differences between the two models become prominent when the residual correlation is .3 or higher. This research integrates the characteristics of longitudinal data in an LTA simulation study and suggests an improved version of LTA estimation.</p>","PeriodicalId":11502,"journal":{"name":"Educational and Psychological Measurement","volume":" ","pages":"00131644251358530"},"PeriodicalIF":2.3,"publicationDate":"2025-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12356818/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144872004","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The Dominant Trait Profile Method of Scoring Multidimensional Forced-Choice Questionnaires. 多维强迫选择问卷的优势特征分析方法。
IF 2.3 3区 心理学 Q2 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-08-14 DOI: 10.1177/00131644251360386
Dimiter M Dimitrov

Proposed is a new method of scoring multidimensional forced-choice (MFC) questionnaires referred to as the dominant trait profile (DTP) method. The DTP method identifies a dominant response vector (DRV) for each trait-a vector of binary scores for preferences in item pairs within MFC blocks from the perspective of a respondent for whom the trait under consideration dominates over the other traits being measured. The respondents' observed response vectors are matched to the DRV for each trait to produce (1/0) matching scores that are then analyzed via latent trait modeling, with scaling options (a) bounded D-scale (from 0 to 1), or (b) item response theory logit scale. The DTP method allows for the comparison of individuals on a trait of interest, as well as their standing in relation to a dominant trait "standard" (criterion). The study results indicate that DTP-based trait estimates are highly correlated with those produced by the popular Thurstonian item response theory model and the Zinnes and Griggs pairwise preference item response theory model, while avoiding the complexity of their designs and some computations issues.

提出了一种多维强迫选择(MFC)问卷评分的新方法——显性特征谱(DTP)法。DTP方法为每个特征确定一个主导反应向量(DRV)——从被调查者的角度来看,在MFC块内的项目对偏好的二进制分数向量,被调查者认为所考虑的特征比其他被测量的特征占主导地位。被调查者观察到的反应向量与每个特质的DRV相匹配,产生(1/0)匹配分数,然后通过潜在特质建模进行分析,缩放选项为(a)有界d量表(从0到1)或(b)项目反应理论logit量表。DTP方法允许对个人感兴趣的特征进行比较,以及他们与主导特征“标准”(标准)的关系。研究结果表明,基于dtp的特质估计与流行的Thurstonian项目反应模型和Zinnes和Griggs配对偏好项目反应模型产生的特质估计高度相关,同时避免了它们设计的复杂性和一些计算问题。
{"title":"The Dominant Trait Profile Method of Scoring Multidimensional Forced-Choice Questionnaires.","authors":"Dimiter M Dimitrov","doi":"10.1177/00131644251360386","DOIUrl":"10.1177/00131644251360386","url":null,"abstract":"<p><p>Proposed is a new method of scoring multidimensional forced-choice (MFC) questionnaires referred to as the dominant trait profile (DTP) method. The DTP method identifies a dominant response vector (DRV) for each trait-a vector of binary scores for preferences in item pairs within MFC blocks from the perspective of a respondent for whom the trait under consideration dominates over the other traits being measured. The respondents' observed response vectors are matched to the DRV for each trait to produce (1/0) matching scores that are then analyzed via latent trait modeling, with scaling options (a) bounded D-scale (from 0 to 1), or (b) item response theory logit scale. The DTP method allows for the comparison of individuals on a trait of interest, as well as their standing in relation to a dominant trait \"standard\" (criterion). The study results indicate that DTP-based trait estimates are highly correlated with those produced by the popular Thurstonian item response theory model and the Zinnes and Griggs pairwise preference item response theory model, while avoiding the complexity of their designs and some computations issues.</p>","PeriodicalId":11502,"journal":{"name":"Educational and Psychological Measurement","volume":" ","pages":"00131644251360386"},"PeriodicalIF":2.3,"publicationDate":"2025-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12356822/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144872007","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Human Expertise and Large Language Model Embeddings in the Content Validity Assessment of Personality Tests. 人格测验内容效度评估中的人类专业知识与大语言模型嵌入。
IF 2.3 3区 心理学 Q2 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-08-14 DOI: 10.1177/00131644251355485
Nicola Milano, Michela Ponticorvo, Davide Marocco

In this article, we explore the application of Large Language Models (LLMs) in assessing the content validity of psychometric instruments, focusing on the Big Five Questionnaire (BFQ) and Big Five Inventory (BFI). Content validity, a cornerstone of test construction, ensures that psychological measures adequately cover their intended constructs. Using both human expert evaluations and advanced LLMs, we compared the accuracy of semantic item-construct alignment. Graduate psychology students employed the Content Validity Ratio to rate test items, forming the human baseline. In parallel, state-of-the-art LLMs, including multilingual and fine-tuned models, analyzed item embeddings to predict construct mappings. The results reveal distinct strengths and limitations of human and AI approaches. Human validators excelled in aligning the behaviorally rich BFQ items, while LLMs performed better with the linguistically concise BFI items. Training strategies significantly influenced LLM performance, with models tailored for lexical relationships outperforming general-purpose LLMs. Here we highlight the complementary potential of hybrid validation systems that integrate human expertise and AI precision. The findings underscore the transformative role of LLMs in psychological assessment, paving the way for scalable, objective, and robust test development methodologies.

本文探讨了大语言模型(LLMs)在心理测量工具内容效度评估中的应用,重点研究了大五问卷(BFQ)和大五量表(BFI)。内容效度是测试结构的基石,确保心理测量充分覆盖其预期结构。使用人类专家评估和高级llm,我们比较了语义项目-构造对齐的准确性。心理学研究生采用内容效度比对测试项目进行评分,形成人的基线。同时,最先进的llm,包括多语言和微调模型,分析项目嵌入来预测结构映射。结果揭示了人类和人工智能方法的独特优势和局限性。人类验证者在对齐行为丰富的BFI项目方面表现出色,而llm在对齐语言简洁的BFI项目方面表现更好。训练策略显著影响LLM的表现,为词汇关系量身定制的模型表现优于通用LLM。在这里,我们强调了整合人类专业知识和人工智能精度的混合验证系统的互补潜力。这些发现强调了法学硕士在心理评估中的变革作用,为可扩展的、客观的、健壮的测试开发方法铺平了道路。
{"title":"Human Expertise and Large Language Model Embeddings in the Content Validity Assessment of Personality Tests.","authors":"Nicola Milano, Michela Ponticorvo, Davide Marocco","doi":"10.1177/00131644251355485","DOIUrl":"10.1177/00131644251355485","url":null,"abstract":"<p><p>In this article, we explore the application of Large Language Models (LLMs) in assessing the content validity of psychometric instruments, focusing on the Big Five Questionnaire (BFQ) and Big Five Inventory (BFI). Content validity, a cornerstone of test construction, ensures that psychological measures adequately cover their intended constructs. Using both human expert evaluations and advanced LLMs, we compared the accuracy of semantic item-construct alignment. Graduate psychology students employed the Content Validity Ratio to rate test items, forming the human baseline. In parallel, state-of-the-art LLMs, including multilingual and fine-tuned models, analyzed item embeddings to predict construct mappings. The results reveal distinct strengths and limitations of human and AI approaches. Human validators excelled in aligning the behaviorally rich BFQ items, while LLMs performed better with the linguistically concise BFI items. Training strategies significantly influenced LLM performance, with models tailored for lexical relationships outperforming general-purpose LLMs. Here we highlight the complementary potential of hybrid validation systems that integrate human expertise and AI precision. The findings underscore the transformative role of LLMs in psychological assessment, paving the way for scalable, objective, and robust test development methodologies.</p>","PeriodicalId":11502,"journal":{"name":"Educational and Psychological Measurement","volume":" ","pages":"00131644251355485"},"PeriodicalIF":2.3,"publicationDate":"2025-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12356817/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144872006","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Educational and Psychological Measurement
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1