Pub Date : 2025-10-23DOI: 10.1177/00131644251371187
Ludovica De Carolis, Minjeong Jeon
This study compares two network-based approaches for analyzing binary psychological assessment data: network psychometrics and latent space item response modeling (LSIRM). Network psychometrics, a well-established method, infers relationships among items or symptoms based on pairwise conditional dependencies. In contrast, LSIRM is a more recent framework that represents item responses as a bipartite network of respondents and items embedded in a latent metric space, where the likelihood of a response decreases with increasing distance between the respondent and item. We evaluate the performance of both methods through simulation studies under varying data-generating conditions. In addition, we demonstrate their applications to real assessment data, showcasing the distinct insights each method offers to researchers and practitioners.
{"title":"Network Approaches to Binary Assessment Data: Network Psychometrics Versus Latent Space Item Response Models.","authors":"Ludovica De Carolis, Minjeong Jeon","doi":"10.1177/00131644251371187","DOIUrl":"10.1177/00131644251371187","url":null,"abstract":"<p><p>This study compares two network-based approaches for analyzing binary psychological assessment data: network psychometrics and latent space item response modeling (LSIRM). Network psychometrics, a well-established method, infers relationships among items or symptoms based on pairwise conditional dependencies. In contrast, LSIRM is a more recent framework that represents item responses as a bipartite network of respondents and items embedded in a latent metric space, where the likelihood of a response decreases with increasing distance between the respondent and item. We evaluate the performance of both methods through simulation studies under varying data-generating conditions. In addition, we demonstrate their applications to real assessment data, showcasing the distinct insights each method offers to researchers and practitioners.</p>","PeriodicalId":11502,"journal":{"name":"Educational and Psychological Measurement","volume":" ","pages":"00131644251371187"},"PeriodicalIF":2.3,"publicationDate":"2025-10-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12549609/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145376583","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-10-07DOI: 10.1177/00131644251374302
Georgios D Sideridis, Mohammed Alghamdi
The three-parameter logistic (3PL) model in item-response theory (IRT) has long been used to account for guessing in multiple-choice assessments through a fixed item-level parameter. However, this approach treats guessing as a property of the test item rather than the individual, potentially misrepresenting the cognitive processes underlying the examinee's behavior. This study evaluates a novel alternative, the Two-Parameter Logistic Extension (2PLE) model, which re-conceptualizes guessing as a function of a person's ability rather than as an item-specific constant. Using Monte Carlo simulation and empirical data from the PIRLS 2021 reading comprehension assessment, we compared the 3PL and 2PLE models on the recovery of latent ability, predictive fit (Leave-One-Out Information Criterion [LOOIC]), and theoretical alignment with test-taking behavior. The simulation results demonstrated that although both models performed similarly in terms of root-mean-squared error (RMSE) for ability estimates, the 2PLE model consistently achieved superior LOOIC values across conditions, particularly with longer tests and larger sample sizes. In an empirical analysis involving the reading achievement of 131 fourth-grade students from Saudi Arabia, model comparison again favored 2PLE, with a statistically significant LOOIC difference (ΔLOOIC = 0.482, z = 2.54). Importantly, person-level guessing estimates derived from the 2PLE model were significantly associated with established person-fit statistics (C*, U3), supporting their criterion validity. These findings suggest that the 2PLE model provides a more cognitively plausible and statistically robust representation of examinee behavior by embedding an ability-dependent guessing function.
项目反应理论(IRT)中的三参数逻辑模型(3PL)长期以来被用于解释多项选择评估中通过固定的项目水平参数进行猜测。然而,这种方法将猜测视为测试项目的属性而不是个人的属性,可能会歪曲考生行为背后的认知过程。本研究评估了一种新的替代方案,即双参数逻辑扩展(2PLE)模型,该模型将猜测重新定义为一个人的能力的函数,而不是特定项目的常数。利用蒙特卡罗模拟和PIRLS 2021阅读理解评估的经验数据,我们比较了3PL和2PLE模型在潜在能力恢复、预测拟合(留一信息标准[LOOIC])以及理论与考试行为的一致性方面的差异。模拟结果表明,尽管两种模型在能力估计的均方根误差(RMSE)方面表现相似,但2PLE模型在各种条件下始终获得更好的LOOIC值,特别是在更长的测试时间和更大的样本量下。在对131名沙特阿拉伯四年级学生阅读成绩的实证分析中,模型比较再次倾向于2PLE,其LOOIC差异具有统计学意义(ΔLOOIC = 0.482, z = 2.54)。重要的是,从2PLE模型中得出的个人水平猜测估计值与已建立的个人拟合统计量显著相关(C*, U3),支持其标准有效性。这些发现表明,通过嵌入能力依赖的猜测函数,2PLE模型提供了一个更可信的认知和统计稳健的考生行为表征。
{"title":"Guessing During Testing is a Person Attribute Not an Instrument Parameter.","authors":"Georgios D Sideridis, Mohammed Alghamdi","doi":"10.1177/00131644251374302","DOIUrl":"10.1177/00131644251374302","url":null,"abstract":"<p><p>The three-parameter logistic (3PL) model in item-response theory (IRT) has long been used to account for guessing in multiple-choice assessments through a fixed item-level parameter. However, this approach treats guessing as a property of the test item rather than the individual, potentially misrepresenting the cognitive processes underlying the examinee's behavior. This study evaluates a novel alternative, the Two-Parameter Logistic Extension (2PLE) model, which re-conceptualizes guessing as a function of a person's ability rather than as an item-specific constant. Using Monte Carlo simulation and empirical data from the PIRLS 2021 reading comprehension assessment, we compared the 3PL and 2PLE models on the recovery of latent ability, predictive fit (Leave-One-Out Information Criterion [LOOIC]), and theoretical alignment with test-taking behavior. The simulation results demonstrated that although both models performed similarly in terms of root-mean-squared error (RMSE) for ability estimates, the 2PLE model consistently achieved superior LOOIC values across conditions, particularly with longer tests and larger sample sizes. In an empirical analysis involving the reading achievement of 131 fourth-grade students from Saudi Arabia, model comparison again favored 2PLE, with a statistically significant LOOIC difference (ΔLOOIC = 0.482, <i>z</i> = 2.54). Importantly, person-level guessing estimates derived from the 2PLE model were significantly associated with established person-fit statistics (C*, U3), supporting their criterion validity. These findings suggest that the 2PLE model provides a more cognitively plausible and statistically robust representation of examinee behavior by embedding an ability-dependent guessing function.</p>","PeriodicalId":11502,"journal":{"name":"Educational and Psychological Measurement","volume":" ","pages":"00131644251374302"},"PeriodicalIF":2.3,"publicationDate":"2025-10-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12504208/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145257555","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-10-04DOI: 10.1177/00131644251369532
Yun-Kyung Kim, Li Cai, YoungKoung Kim
In item response theory modeling, item fit analysis using posterior expectations, otherwise known as pseudocounts, has many advantages. They are readily obtained from the E-step output of the Bock-Aitkin Expectation-Maximization (EM) algorithm and continue to function as a basis of evaluating model fit, even when missing data are present. This paper aimed to improve the interpretability of the root mean squared deviation (RMSD) index based on posterior expectations. In Study 1, we assessed its performance using two approaches. First, we employed the poor person's posterior predictive model checking (PP-PPMC) to compute their significance levels. The resulting Type I error was generally controlled below the nominal level, but power noticeably declined with smaller sample sizes and shorter test lengths. Second, we used receiver operating characteristic (ROC) curve analysis (±) to empirically determine the reference values (cutoff thresholds) that achieve an optimal balance between false-positive and true-positive rates. Importantly, we identified optimal reference values for each combination of sample size and test length in the simulation conditions. The cutoff threshold approach outperformed the PP-PPMC approach with greater gains in true-positive rates than losses from the inflated false-positive rates. In Study 2, we extended the cutoff threshold approach to conditions with larger sample sizes and longer test lengths. Moreover, we evaluated the performance of the optimized cutoff thresholds under varying levels of data missingness. Finally, we employed response surface analysis (±) to develop a prediction model that generalizes the way the reference values vary with sample size and test length. Overall, this study demonstrates the application of the PP-PPMC for item fit diagnostics and implements a practical frequentist approach to empirically derive reference values. Using our prediction model, practitioners can compute the reference values of RMSD that are tailored to their dataset's sample size and test length.
{"title":"Evaluation of Item Fit With Output From the EM Algorithm: RMSD Index Based on Posterior Expectations.","authors":"Yun-Kyung Kim, Li Cai, YoungKoung Kim","doi":"10.1177/00131644251369532","DOIUrl":"10.1177/00131644251369532","url":null,"abstract":"<p><p>In item response theory modeling, item fit analysis using posterior expectations, otherwise known as pseudocounts, has many advantages. They are readily obtained from the E-step output of the Bock-Aitkin Expectation-Maximization (EM) algorithm and continue to function as a basis of evaluating model fit, even when missing data are present. This paper aimed to improve the interpretability of the root mean squared deviation (RMSD) index based on posterior expectations. In Study 1, we assessed its performance using two approaches. First, we employed the poor person's posterior predictive model checking (PP-PPMC) to compute their significance levels. The resulting Type I error was generally controlled below the nominal level, but power noticeably declined with smaller sample sizes and shorter test lengths. Second, we used receiver operating characteristic (ROC) curve analysis (±) to empirically determine the reference values (cutoff thresholds) that achieve an optimal balance between false-positive and true-positive rates. Importantly, we identified optimal reference values for each combination of sample size and test length in the simulation conditions. The cutoff threshold approach outperformed the PP-PPMC approach with greater gains in true-positive rates than losses from the inflated false-positive rates. In Study 2, we extended the cutoff threshold approach to conditions with larger sample sizes and longer test lengths. Moreover, we evaluated the performance of the optimized cutoff thresholds under varying levels of data missingness. Finally, we employed response surface analysis (±) to develop a prediction model that generalizes the way the reference values vary with sample size and test length. Overall, this study demonstrates the application of the PP-PPMC for item fit diagnostics and implements a practical frequentist approach to empirically derive reference values. Using our prediction model, practitioners can compute the reference values of RMSD that are tailored to their dataset's sample size and test length.</p>","PeriodicalId":11502,"journal":{"name":"Educational and Psychological Measurement","volume":" ","pages":"00131644251369532"},"PeriodicalIF":2.3,"publicationDate":"2025-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12496452/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145238234","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-09-24DOI: 10.1177/00131644251370605
Nana Amma Berko Asamoah, Ronna C Turner, Wen-Juo Lo, Brandon L Crawford, Kristen N Jozkowski
Ensuring fairness in educational and psychological assessments is critical, particularly in detecting differential item functioning (DIF), where items perform differently across subgroups. The Rasch tree method, a model-based recursive partitioning approach, is an innovative and flexible DIF detection tool that does not require the pre-specification of focal and reference groups. However, research systematically examining its performance under realistic measurement conditions, such as when multiple DIF items do not consistently favor one subgroup, is limited. This study builds on prior research, evaluating the Rasch tree method's ability to detect DIF by investigating the impact of DIF balance, along with other key factors such as DIF magnitude, sample size, test length, and contamination levels. Additionally, we incorporate the Educational Testing Service effect size heuristic as a criterion to compare the DIF detection rate performance with only statistical significance. Results indicate that the Rasch tree has better true DIF detection rates under balanced DIF conditions and large DIF magnitudes. However, its accuracy declines when DIF is unbalanced and the percentage of DIF contamination increases. The use of an effect size reduces the detection of negligible DIF. Caution is recommended with smaller samples, where detection rates are the lowest, especially for larger DIF magnitudes and increased DIF contamination percentages in unbalanced conditions. The study highlights the strengths and limitations of the Rasch tree method under a variety of conditions, underscores the importance of the impact of DIF group imbalance, and provides recommendations for optimizing DIF detection in practical assessment scenarios.
{"title":"Impacts of DIF Item Balance and Effect Size Incorporation With the Rasch Tree.","authors":"Nana Amma Berko Asamoah, Ronna C Turner, Wen-Juo Lo, Brandon L Crawford, Kristen N Jozkowski","doi":"10.1177/00131644251370605","DOIUrl":"10.1177/00131644251370605","url":null,"abstract":"<p><p>Ensuring fairness in educational and psychological assessments is critical, particularly in detecting differential item functioning (DIF), where items perform differently across subgroups. The Rasch tree method, a model-based recursive partitioning approach, is an innovative and flexible DIF detection tool that does not require the pre-specification of focal and reference groups. However, research systematically examining its performance under realistic measurement conditions, such as when multiple DIF items do not consistently favor one subgroup, is limited. This study builds on prior research, evaluating the Rasch tree method's ability to detect DIF by investigating the impact of DIF balance, along with other key factors such as DIF magnitude, sample size, test length, and contamination levels. Additionally, we incorporate the Educational Testing Service effect size heuristic as a criterion to compare the DIF detection rate performance with only statistical significance. Results indicate that the Rasch tree has better true DIF detection rates under balanced DIF conditions and large DIF magnitudes. However, its accuracy declines when DIF is unbalanced and the percentage of DIF contamination increases. The use of an effect size reduces the detection of negligible DIF. Caution is recommended with smaller samples, where detection rates are the lowest, especially for larger DIF magnitudes and increased DIF contamination percentages in unbalanced conditions. The study highlights the strengths and limitations of the Rasch tree method under a variety of conditions, underscores the importance of the impact of DIF group imbalance, and provides recommendations for optimizing DIF detection in practical assessment scenarios.</p>","PeriodicalId":11502,"journal":{"name":"Educational and Psychological Measurement","volume":" ","pages":"00131644251370605"},"PeriodicalIF":2.3,"publicationDate":"2025-09-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12463886/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145184997","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-09-14DOI: 10.1177/00131644251368335
Chansoon Lee, Kylie Gorney, Jianshen Chen
Sequential procedures have been shown to be effective methods for real-time detection of compromised items in computerized adaptive testing. In this study, we propose three item response theory-based sequential procedures that involve the use of item scores and response times (RTs). The first procedure requires that either the score-based statistic or the RT-based statistic be extreme, the second procedure requires that both the score-based statistic and the RT-based statistic be extreme, and the third procedure requires that a combined score and RT-based statistic be extreme. Results suggest that the third procedure is the most promising, providing a reasonable balance between the false-positive rate and the true-positive rate while also producing relatively short lag times across a wide range of simulation conditions.
{"title":"Using Item Scores and Response Times to Detect Item Compromise in Computerized Adaptive Testing.","authors":"Chansoon Lee, Kylie Gorney, Jianshen Chen","doi":"10.1177/00131644251368335","DOIUrl":"10.1177/00131644251368335","url":null,"abstract":"<p><p>Sequential procedures have been shown to be effective methods for real-time detection of compromised items in computerized adaptive testing. In this study, we propose three item response theory-based sequential procedures that involve the use of item scores and response times (RTs). The first procedure requires that either the score-based statistic or the RT-based statistic be extreme, the second procedure requires that both the score-based statistic and the RT-based statistic be extreme, and the third procedure requires that a combined score and RT-based statistic be extreme. Results suggest that the third procedure is the most promising, providing a reasonable balance between the false-positive rate and the true-positive rate while also producing relatively short lag times across a wide range of simulation conditions.</p>","PeriodicalId":11502,"journal":{"name":"Educational and Psychological Measurement","volume":" ","pages":"00131644251368335"},"PeriodicalIF":2.3,"publicationDate":"2025-09-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12433998/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145074512","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-09-08DOI: 10.1177/00131644251358226
Diego F Graña, Rodrigo S Kreitchmann, Miguel A Sorrel, Luis Eduardo Garrido, Francisco J Abad
Forced-choice (FC) questionnaires have gained increasing attention as a strategy to reduce social desirability in self-reports, supported by advancements in confirmatory models that address the ipsativity of FC test scores. However, these models assume a known dimensionality and structure, which can be overly restrictive or fail to fit the data adequately. Consequently, exploratory models can be required, with accurate dimensionality assessment as a critical first step. FC questionnaires also pose unique challenges for dimensionality assessment, due to their inherently complex multidimensional structures. Despite this, no prior studies have systematically evaluated dimensionality assessment methods for FC data. To fill this gap, the present study examines five commonly used methods: the Kaiser Criterion, Empirical Kaiser Criterion, Parallel Analysis (PA), Hull Method, and Exploratory Graph Analysis. A Monte Carlo simulation study was conducted, manipulating key design features of FC questionnaires, such as the number of dimensions, items per dimension, response formats (e.g., binary vs. graded), and block composition (e.g., inclusion of heteropolar and unidimensional blocks), as well as factor loadings, inter-factor correlations, and sample size. Results showed that the Maximal Kaiser Criterion and PA methods outperformed the others, achieving higher accuracy and lower bias. Performance improved particularly when heteropolar or unidimensional blocks were included or when the questionnaire length increased. These findings emphasize the importance of thoughtful FC test design and provide practical recommendations for improving dimensionality assessment in this format.
{"title":"Dimensionality Assessment in Forced-Choice Questionnaires: First Steps Toward an Exploratory Framework.","authors":"Diego F Graña, Rodrigo S Kreitchmann, Miguel A Sorrel, Luis Eduardo Garrido, Francisco J Abad","doi":"10.1177/00131644251358226","DOIUrl":"10.1177/00131644251358226","url":null,"abstract":"<p><p>Forced-choice (FC) questionnaires have gained increasing attention as a strategy to reduce social desirability in self-reports, supported by advancements in confirmatory models that address the ipsativity of FC test scores. However, these models assume a known dimensionality and structure, which can be overly restrictive or fail to fit the data adequately. Consequently, exploratory models can be required, with accurate dimensionality assessment as a critical first step. FC questionnaires also pose unique challenges for dimensionality assessment, due to their inherently complex multidimensional structures. Despite this, no prior studies have systematically evaluated dimensionality assessment methods for FC data. To fill this gap, the present study examines five commonly used methods: the Kaiser Criterion, Empirical Kaiser Criterion, Parallel Analysis (PA), Hull Method, and Exploratory Graph Analysis. A Monte Carlo simulation study was conducted, manipulating key design features of FC questionnaires, such as the number of dimensions, items per dimension, response formats (e.g., binary vs. graded), and block composition (e.g., inclusion of heteropolar and unidimensional blocks), as well as factor loadings, inter-factor correlations, and sample size. Results showed that the Maximal Kaiser Criterion and PA methods outperformed the others, achieving higher accuracy and lower bias. Performance improved particularly when heteropolar or unidimensional blocks were included or when the questionnaire length increased. These findings emphasize the importance of thoughtful FC test design and provide practical recommendations for improving dimensionality assessment in this format.</p>","PeriodicalId":11502,"journal":{"name":"Educational and Psychological Measurement","volume":" ","pages":"00131644251358226"},"PeriodicalIF":2.3,"publicationDate":"2025-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12420653/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145039408","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-09-06DOI: 10.1177/00131644251364252
Johan Braeken, Saskia van Laar
Measurement appropriateness concerns the question of whether the test or survey scale under consideration can provide a valid measure for a specific individual. An aberrant item response pattern would provide internal counterevidence against using the test/scale for this person, whereas a more typical item response pattern would imply a fit of the measure to the person. Traditional approaches, including the popular Lz person fit statistic, are hampered by their two-stage estimation procedure and the fact that the fit for the person is determined based on the model calibrated on data that include the misfitting persons. This calibration bias creates suboptimal conditions for person fit assessment. Solutions have been sought through the derivation of approximating bias-correction formulas and/or iterative purification procedures. Yet, here we discuss an alternative one-stage solution that involves calibrating a model expansion of the measurement model that includes a mixture component for target aberrant response patterns. A simulation study evaluates the approach under the most unfavorable and least-studied conditions for person fit indices, short polytomous survey scales, similar to those found in large-scale educational assessments such as the Program for International Student Assessment or Trends in Mathematics and Science Study.
{"title":"Reducing Calibration Bias for Person Fit Assessment by Mixture Model Expansion.","authors":"Johan Braeken, Saskia van Laar","doi":"10.1177/00131644251364252","DOIUrl":"10.1177/00131644251364252","url":null,"abstract":"<p><p>Measurement appropriateness concerns the question of whether the test or survey scale under consideration can provide a valid measure for a specific individual. An aberrant item response pattern would provide internal counterevidence against using the test/scale for this person, whereas a more typical item response pattern would imply a fit of the measure to the person. Traditional approaches, including the popular Lz person fit statistic, are hampered by their two-stage estimation procedure and the fact that the fit for the person is determined based on the model calibrated on data that include the misfitting persons. This calibration bias creates suboptimal conditions for person fit assessment. Solutions have been sought through the derivation of approximating bias-correction formulas and/or iterative purification procedures. Yet, here we discuss an alternative one-stage solution that involves calibrating a model expansion of the measurement model that includes a mixture component for target aberrant response patterns. A simulation study evaluates the approach under the most unfavorable and least-studied conditions for person fit indices, short polytomous survey scales, similar to those found in large-scale educational assessments such as the Program for International Student Assessment or Trends in Mathematics and Science Study.</p>","PeriodicalId":11502,"journal":{"name":"Educational and Psychological Measurement","volume":" ","pages":"00131644251364252"},"PeriodicalIF":2.3,"publicationDate":"2025-09-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12413990/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145023055","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-08-23DOI: 10.1177/00131644251350536
Tenko Raykov, Christine DiStefano, Yusuf Ransome
A procedure for evaluation of the proportion explained component variance by the underlying trait in behavioral scales with second-order structure is outlined. The resulting index of accounted for variance over all scale components is a useful and informative complement to the conventional omega-hierarchical coefficient as well as the proportion of explained component correlation. A point and interval estimation method is described for the discussed index, which utilizes a confirmatory factor analysis approach within the latent variable modeling methodology. The procedure can be used with widely available software and is illustrated on data.
{"title":"Proportion Explained Component Variance in Second-Order Scales: A Note on a Latent Variable Modeling Approach.","authors":"Tenko Raykov, Christine DiStefano, Yusuf Ransome","doi":"10.1177/00131644251350536","DOIUrl":"https://doi.org/10.1177/00131644251350536","url":null,"abstract":"<p><p>A procedure for evaluation of the proportion explained component variance by the underlying trait in behavioral scales with second-order structure is outlined. The resulting index of accounted for variance over all scale components is a useful and informative complement to the conventional omega-hierarchical coefficient as well as the proportion of explained component correlation. A point and interval estimation method is described for the discussed index, which utilizes a confirmatory factor analysis approach within the latent variable modeling methodology. The procedure can be used with widely available software and is illustrated on data.</p>","PeriodicalId":11502,"journal":{"name":"Educational and Psychological Measurement","volume":" ","pages":"00131644251350536"},"PeriodicalIF":2.3,"publicationDate":"2025-08-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12374956/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144946890","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-08-15DOI: 10.1177/00131644251347530
André Beauducel, Norbert Hilger, Anneke C Weide
Previous research has shown that ignoring individual differences of factor loadings in conventional factor models may reduce the determinacy of factor score predictors. Therefore, the aim of the present study is to propose a heterogeneous regression factor score predictor (HRFS) with larger determinacy than the conventional regression factor score predictor (RFS) when individuals have different factor loadings. First, a method for the estimation of individual loadings is proposed. The individual loading estimates are used to compute the HRFS. Then, a binomial test for loading heterogeneity of a factor is proposed to compute the HRFS only when the test is significant. Otherwise, the conventional RFS should be used. A simulation study reveals that the HRFS has larger determinacy than the conventional RFS in populations with substantial loading heterogeneity. An empirical example based on subsamples drawn randomly from a large sample of Big Five Markers indicates that the determinacy can be improved for the factor emotional stability when the HRFS is computed.
{"title":"How to Improve the Regression Factor Score Predictor When Individuals Have Different Factor Loadings.","authors":"André Beauducel, Norbert Hilger, Anneke C Weide","doi":"10.1177/00131644251347530","DOIUrl":"10.1177/00131644251347530","url":null,"abstract":"<p><p>Previous research has shown that ignoring individual differences of factor loadings in conventional factor models may reduce the determinacy of factor score predictors. Therefore, the aim of the present study is to propose a heterogeneous regression factor score predictor (HRFS) with larger determinacy than the conventional regression factor score predictor (RFS) when individuals have different factor loadings. First, a method for the estimation of individual loadings is proposed. The individual loading estimates are used to compute the HRFS. Then, a binomial test for loading heterogeneity of a factor is proposed to compute the HRFS only when the test is significant. Otherwise, the conventional RFS should be used. A simulation study reveals that the HRFS has larger determinacy than the conventional RFS in populations with substantial loading heterogeneity. An empirical example based on subsamples drawn randomly from a large sample of Big Five Markers indicates that the determinacy can be improved for the factor emotional stability when the HRFS is computed.</p>","PeriodicalId":11502,"journal":{"name":"Educational and Psychological Measurement","volume":" ","pages":"00131644251347530"},"PeriodicalIF":2.3,"publicationDate":"2025-08-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12356820/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144872005","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-08-14DOI: 10.1177/00131644251358530
Na Yeon Lee, Sojin Yoon, Sehee Hong
In longitudinal mixture models like latent transition analysis (LTA), identical items are often repeatedly measured across multiple time points to define latent classes and individuals' similar response patterns across multiple time points, which attributes to residual correlations. Therefore, this study hypothesized that an LTA model assuming residual correlations among indicator variables measured repeatedly across multiple time points would provide more accurate estimates of transition probabilities than a traditional LTA model. To test this hypothesis, a Monte Carlo simulation was conducted to generate data both with and without specified residual correlations among the repeatedly measured indicator variables, and the two LTA models-one that accounted for residual correlations and one that did not-were compared. This study included transition probabilities, numbers of indicator variables, sample sizes, and levels of residual correlations as the simulation conditions. The estimation performances were compared based on parameter estimate bias, mean squared error, and coverage. The results demonstrate that LTA with residual correlations outperforms traditional LTA in estimating transition probabilities, and the differences between the two models become prominent when the residual correlation is .3 or higher. This research integrates the characteristics of longitudinal data in an LTA simulation study and suggests an improved version of LTA estimation.
{"title":"A Comparison of LTA Models with and Without Residual Correlation in Estimating Transition Probabilities.","authors":"Na Yeon Lee, Sojin Yoon, Sehee Hong","doi":"10.1177/00131644251358530","DOIUrl":"10.1177/00131644251358530","url":null,"abstract":"<p><p>In longitudinal mixture models like latent transition analysis (LTA), identical items are often repeatedly measured across multiple time points to define latent classes and individuals' similar response patterns across multiple time points, which attributes to residual correlations. Therefore, this study hypothesized that an LTA model assuming residual correlations among indicator variables measured repeatedly across multiple time points would provide more accurate estimates of transition probabilities than a traditional LTA model. To test this hypothesis, a Monte Carlo simulation was conducted to generate data both with and without specified residual correlations among the repeatedly measured indicator variables, and the two LTA models-one that accounted for residual correlations and one that did not-were compared. This study included transition probabilities, numbers of indicator variables, sample sizes, and levels of residual correlations as the simulation conditions. The estimation performances were compared based on parameter estimate bias, mean squared error, and coverage. The results demonstrate that LTA with residual correlations outperforms traditional LTA in estimating transition probabilities, and the differences between the two models become prominent when the residual correlation is .3 or higher. This research integrates the characteristics of longitudinal data in an LTA simulation study and suggests an improved version of LTA estimation.</p>","PeriodicalId":11502,"journal":{"name":"Educational and Psychological Measurement","volume":" ","pages":"00131644251358530"},"PeriodicalIF":2.3,"publicationDate":"2025-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12356818/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144872004","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}