Factor score indeterminacy is a characteristic property of factor analysis (FA) models. This research introduces a novel procedure, regression-based factor score exploration (RFE), which uniquely determines factor scores and simultaneously estimates other parameters of the FA model. RFE uniquely determines factor scores by minimizing a loss function that balances FA and multivariate regression, regulated by a tuning parameter. Theoretical aspects of RFE, including the uniqueness of factor scores, the relationship between observed and latent variables, and rotational indeterminacy, are examined. Additionally, clustering-based factor exploration (CFE) is presented as a variant of RFE, derived by generalizing the penalty term to enable the clustering of factor scores. It is demonstrated that CFE creates cluster structures more accurately than the existing method. A simulation study shows that the proposed procedures accurately recover true parameter matrices even in the presence of error-contaminated data, with lower computational demand compared to existing methods. Real data examples illustrate that the proposed procedures provide interpretable results, demonstrating high relevance to the factor scores obtained by existing methods.
{"title":"Identification of Factor Scores by Regression with External Variables in Exploratory Factor Analysis.","authors":"Naoto Yamashita","doi":"10.1017/psy.2025.10025","DOIUrl":"10.1017/psy.2025.10025","url":null,"abstract":"<p><p>Factor score indeterminacy is a characteristic property of factor analysis (FA) models. This research introduces a novel procedure, regression-based factor score exploration (RFE), which uniquely determines factor scores and simultaneously estimates other parameters of the FA model. RFE uniquely determines factor scores by minimizing a loss function that balances FA and multivariate regression, regulated by a tuning parameter. Theoretical aspects of RFE, including the uniqueness of factor scores, the relationship between observed and latent variables, and rotational indeterminacy, are examined. Additionally, clustering-based factor exploration (CFE) is presented as a variant of RFE, derived by generalizing the penalty term to enable the clustering of factor scores. It is demonstrated that CFE creates cluster structures more accurately than the existing method. A simulation study shows that the proposed procedures accurately recover true parameter matrices even in the presence of error-contaminated data, with lower computational demand compared to existing methods. Real data examples illustrate that the proposed procedures provide interpretable results, demonstrating high relevance to the factor scores obtained by existing methods.</p>","PeriodicalId":54534,"journal":{"name":"Psychometrika","volume":" ","pages":"1-14"},"PeriodicalIF":3.1,"publicationDate":"2025-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12483718/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144303614","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This article proposes a new statistical model to infer interpretable population-level preferences from ordinal comparison data. Such data is ubiquitous, e.g., ranked choice votes, top-10 movie lists, and pairwise sports outcomes. Traditional statistical inference on ordinal comparison data results in an overall ranking of objects, e.g., from best to worst, with each object having a unique rank. However, the ranks of some objects may not be statistically distinguishable. This could happen due to insufficient data or to the true underlying object qualities being equal. Because uncertainty communication in estimates of overall rankings is notoriously difficult, we take a different approach and allow groups of objects to have equal ranks or be rank-clustered in our model. Existing models related to rank-clustering are limited by their inability to handle a variety of ordinal data types, to quantify uncertainty, or by the need to pre-specify the number and size of potential rank-clusters. We solve these limitations through our proposed Bayesian Rank-Clustered Bradley-Terry-Luce (BTL) model. We accommodate rank-clustering via parameter fusion by imposing a novel spike-and-slab prior on object-specific worth parameters in the BTL family of distributions for ordinal comparisons. We demonstrate rank-clustering on simulated and real datasets in surveys, elections, and sports analytics.
{"title":"Bayesian Rank-Clustering.","authors":"Michael Pearce, Elena A Erosheva","doi":"10.1017/psy.2025.10014","DOIUrl":"10.1017/psy.2025.10014","url":null,"abstract":"<p><p>This article proposes a new statistical model to infer interpretable population-level preferences from ordinal comparison data. Such data is ubiquitous, e.g., ranked choice votes, top-10 movie lists, and pairwise sports outcomes. Traditional statistical inference on ordinal comparison data results in an overall ranking of objects, e.g., from best to worst, with each object having a unique rank. However, the ranks of some objects may not be statistically distinguishable. This could happen due to insufficient data or to the true underlying object qualities being equal. Because uncertainty communication in estimates of overall rankings is notoriously difficult, we take a different approach and allow groups of objects to have equal ranks or be <i>rank-clustered</i> in our model. Existing models related to rank-clustering are limited by their inability to handle a variety of ordinal data types, to quantify uncertainty, or by the need to pre-specify the number and size of potential rank-clusters. We solve these limitations through our proposed Bayesian <i>Rank-Clustered Bradley-Terry-Luce (BTL)</i> model. We accommodate rank-clustering via parameter fusion by imposing a novel spike-and-slab prior on object-specific worth parameters in the BTL family of distributions for ordinal comparisons. We demonstrate rank-clustering on simulated and real datasets in surveys, elections, and sports analytics.</p>","PeriodicalId":54534,"journal":{"name":"Psychometrika","volume":" ","pages":"1-28"},"PeriodicalIF":3.1,"publicationDate":"2025-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12483714/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144303613","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this article, we propose a series of latent trait models for the responses and the response times on low stakes tests where some test takers respond preliminary without making full effort to solve the items. The models consider individual differences in capability and persistence. Core of the models is a race between the solution process and a process of disengagement that interrupts the solution process. The different processes are modeled with the linear ballistic accumulator model. Within this general framework, we develop different model variants that differ in the number of accumulators and the way the response is generated when the solution process is interrupted. We distinguish no guessing, random guessing and informed guessing where the guessing probability depends on the status of the solution process. We conduct simulation studies on parameter recovery and on trait estimation. The simulation study suggests that parameter values and traits can be recovered well under certain conditions. Finally, we apply the model variants to empirical data.
{"title":"Accounting for Persistence in Tests with Linear Ballistic Accumulator Models.","authors":"Jochen Ranger, Sören Much, Niklas Neek, Augustin Mutak, Steffi Pohl","doi":"10.1017/psy.2025.10026","DOIUrl":"10.1017/psy.2025.10026","url":null,"abstract":"<p><p>In this article, we propose a series of latent trait models for the responses and the response times on low stakes tests where some test takers respond preliminary without making full effort to solve the items. The models consider individual differences in capability and persistence. Core of the models is a race between the solution process and a process of disengagement that interrupts the solution process. The different processes are modeled with the linear ballistic accumulator model. Within this general framework, we develop different model variants that differ in the number of accumulators and the way the response is generated when the solution process is interrupted. We distinguish no guessing, random guessing and informed guessing where the guessing probability depends on the status of the solution process. We conduct simulation studies on parameter recovery and on trait estimation. The simulation study suggests that parameter values and traits can be recovered well under certain conditions. Finally, we apply the model variants to empirical data.</p>","PeriodicalId":54534,"journal":{"name":"Psychometrika","volume":" ","pages":"1-25"},"PeriodicalIF":3.1,"publicationDate":"2025-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12483707/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144303612","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Andrew Ackerman, Zhengwu Zhang, Jan Hannig, Jack Prothero, J S Marron
Neuroimaging studies, such as the Human Connectome Project (HCP), often collect multifaceted data to study the human brain. However, these data are often analyzed in a pairwise fashion, which can hinder our understanding of how different brain-related measures interact. In this study, we analyze the multi-block HCP data using data integration via analysis of subspaces (DIVAS). We integrate structural and functional brain connectivity, substance use, cognition, and genetics in an exhaustive five-block analysis. This gives rise to the important finding that genetics is the single data modality most predictive of brain connectivity, outside of brain connectivity itself. Nearly 14% of the variation in functional connectivity (FC) and roughly 12% of the variation in structural connectivity (SC) is attributed to shared spaces with genetics. Moreover, investigations of shared space loadings provide interpretable associations between particular brain regions and drivers of variability. Novel Jackstraw hypothesis tests are developed for the DIVAS framework to establish statistically significant loadings. For example, in the (FC, SC, and substance use) subspace, these novel hypothesis tests highlight largely negative functional and structural connections suggesting the brain's role in physiological responses to increased substance use. Our findings are validated on genetically relevant subjects not studied in the main analysis.
{"title":"Multifaceted Neuroimaging Data Integration via Analysis of Subspaces.","authors":"Andrew Ackerman, Zhengwu Zhang, Jan Hannig, Jack Prothero, J S Marron","doi":"10.1017/psy.2025.10020","DOIUrl":"10.1017/psy.2025.10020","url":null,"abstract":"<p><p>Neuroimaging studies, such as the Human Connectome Project (HCP), often collect multifaceted data to study the human brain. However, these data are often analyzed in a pairwise fashion, which can hinder our understanding of how different brain-related measures interact. In this study, we analyze the multi-block HCP data using data integration via analysis of subspaces (DIVAS). We integrate structural and functional brain connectivity, substance use, cognition, and genetics in an exhaustive five-block analysis. This gives rise to the important finding that genetics is the single data modality most predictive of brain connectivity, outside of brain connectivity itself. Nearly 14% of the variation in functional connectivity (FC) and roughly 12% of the variation in structural connectivity (SC) is attributed to shared spaces with genetics. Moreover, investigations of shared space loadings provide interpretable associations between particular brain regions and drivers of variability. Novel Jackstraw hypothesis tests are developed for the DIVAS framework to establish statistically significant loadings. For example, in the (FC, SC, and substance use) subspace, these novel hypothesis tests highlight largely negative functional and structural connections suggesting the brain's role in physiological responses to increased substance use. Our findings are validated on genetically relevant subjects not studied in the main analysis.</p>","PeriodicalId":54534,"journal":{"name":"Psychometrika","volume":" ","pages":"1-22"},"PeriodicalIF":3.1,"publicationDate":"2025-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144303616","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Peer grading is an educational system in which students assess each other's work. It is commonly applied under Massive Open Online Course (MOOC) and offline classroom settings. With this system, instructors receive a reduced grading workload, and students enhance their understanding of course materials by grading others' work. Peer grading data have a complex dependence structure, for which all the peer grades may be dependent. This complex dependence structure is due to a network structure of peer grading, where each student can be viewed as a vertex of the network, and each peer grade serves as an edge connecting one student as a grader to another student as an examinee. This article introduces a latent variable model framework for analyzing peer grading data and develops a fully Bayesian procedure for its statistical inference. This framework has several advantages. First, when aggregating multiple peer grades, the average score and other simple summary statistics fail to account for grader effects and, thus, can be biased. The proposed approach produces more accurate model parameter estimates and, therefore, more accurate aggregated grades by modeling the heterogeneous grading behavior with latent variables. Second, the proposed method provides a way to assess each student's performance as a grader, which may be used to identify a pool of reliable graders or generate feedback to help students improve their grading. Third, our model may further provide insights into the peer grading system by answering questions such as whether a student who performs better in coursework also tends to be a more reliable grader. Finally, thanks to the Bayesian approach, uncertainty quantification is straightforward when inferring the student-specific latent variables as well as the structural parameters of the model. The proposed method is applied to two real-world datasets.
{"title":"Unfolding the Network of Peer Grades: A Latent Variable Approach.","authors":"Giuseppe Mignemi, Yunxiao Chen, Irini Moustaki","doi":"10.1017/psy.2025.10021","DOIUrl":"10.1017/psy.2025.10021","url":null,"abstract":"<p><p>Peer grading is an educational system in which students assess each other's work. It is commonly applied under Massive Open Online Course (MOOC) and offline classroom settings. With this system, instructors receive a reduced grading workload, and students enhance their understanding of course materials by grading others' work. Peer grading data have a complex dependence structure, for which all the peer grades may be dependent. This complex dependence structure is due to a network structure of peer grading, where each student can be viewed as a vertex of the network, and each peer grade serves as an edge connecting one student as a grader to another student as an examinee. This article introduces a latent variable model framework for analyzing peer grading data and develops a fully Bayesian procedure for its statistical inference. This framework has several advantages. First, when aggregating multiple peer grades, the average score and other simple summary statistics fail to account for grader effects and, thus, can be biased. The proposed approach produces more accurate model parameter estimates and, therefore, more accurate aggregated grades by modeling the heterogeneous grading behavior with latent variables. Second, the proposed method provides a way to assess each student's performance as a grader, which may be used to identify a pool of reliable graders or generate feedback to help students improve their grading. Third, our model may further provide insights into the peer grading system by answering questions such as whether a student who performs better in coursework also tends to be a more reliable grader. Finally, thanks to the Bayesian approach, uncertainty quantification is straightforward when inferring the student-specific latent variables as well as the structural parameters of the model. The proposed method is applied to two real-world datasets.</p>","PeriodicalId":54534,"journal":{"name":"Psychometrika","volume":" ","pages":"1-22"},"PeriodicalIF":3.1,"publicationDate":"2025-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12483701/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144303617","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yuanyuan Ji, Jordan Revol, Anna Schouten, Marieke J Schreuder, Eva Ceulemans
Researchers interested in dyadic processes increasingly collect intensive longitudinal data (ILD), with the longitudinal actor-partner interdependence model (L-APIM) being a popular modeling approach. However, due to non-compliance and the use of conditional questions, ILD are almost always incomplete. These missing data issues become more prominent in dyadic studies, because partners often miss different measurement occasions or disagree about features that trigger conditional questions. Large amounts of missing data challenge the L-APIM's estimation performance. Specifically, we found that non-convergence occurred when applying the L-APIM to pre-existing dyadic diary data with a lot of missing values. Using a simulation study, we systematically examined the performance of the L-APIM in dyadic ILD with missing values. Consistent with our illustrative data, we found that non-convergence often occurred in conditions with small sample sizes, while the fixed within-person actor and partner effects were well estimated when analyses did converge. Additionally, considering potential convergence failures with the L-APIM, we investigated 31 alternative models and evaluated their performance on simulated and empirical data, showing that multiple alternatives may alleviate the convergence problems. Overall, when the L-APIM fails to converge, we recommend fitting multiple alternative models to check the robustness of the results.
{"title":"Performance of the Longitudinal Actor-Partner Interdependence Model in Case of Large Amounts of Missing Values: Challenges and Possible Alternatives.","authors":"Yuanyuan Ji, Jordan Revol, Anna Schouten, Marieke J Schreuder, Eva Ceulemans","doi":"10.1017/psy.2025.18","DOIUrl":"10.1017/psy.2025.18","url":null,"abstract":"<p><p>Researchers interested in dyadic processes increasingly collect intensive longitudinal data (ILD), with the longitudinal actor-partner interdependence model (L-APIM) being a popular modeling approach. However, due to non-compliance and the use of conditional questions, ILD are almost always incomplete. These missing data issues become more prominent in dyadic studies, because partners often miss different measurement occasions or disagree about features that trigger conditional questions. Large amounts of missing data challenge the L-APIM's estimation performance. Specifically, we found that non-convergence occurred when applying the L-APIM to pre-existing dyadic diary data with a lot of missing values. Using a simulation study, we systematically examined the performance of the L-APIM in dyadic ILD with missing values. Consistent with our illustrative data, we found that non-convergence often occurred in conditions with small sample sizes, while the fixed within-person actor and partner effects were well estimated when analyses did converge. Additionally, considering potential convergence failures with the L-APIM, we investigated 31 alternative models and evaluated their performance on simulated and empirical data, showing that multiple alternatives may alleviate the convergence problems. Overall, when the L-APIM fails to converge, we recommend fitting multiple alternative models to check the robustness of the results.</p>","PeriodicalId":54534,"journal":{"name":"Psychometrika","volume":" ","pages":"1-23"},"PeriodicalIF":3.1,"publicationDate":"2025-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12483710/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144287081","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Fitting propensity (FP) analysis quantifies model complexity but has been impeded in item response theory (IRT) due to the computational infeasibility of uniformly and randomly sampling multinomial item response patterns under a full-information approach. We adopt a limited-information (LI) approach, wherein we generate data only up to the lower-order margins of the complete item response patterns. We present an algorithm that builds upon classical work on sampling contingency tables with fixed margins by implementing a Sequential Importance Sampling algorithm to Quickly and Uniformly Obtain Contingency tables (SISQUOC). Theoretical justification and comprehensive validation demonstrate the effectiveness of the SISQUOC algorithm for IRT and offer insights into sampling from the complete data space defined by the lower-order margins. We highlight the efficiency and simplicity of the LI approach for generating large and uniformly random datasets of dichotomous and polytomous items. We further present an iterative proportional fitting procedure to reconstruct joint multinomial probabilities after LI-based data generation, facilitating FP evaluation using traditional estimation strategies. We illustrate the proposed approach by examining the FP of the graded response model and generalized partial credit model, with results suggesting that their functional forms express similar degrees of configural complexity.
{"title":"Random Item Response Data Generation Using a Limited-Information Approach: Applications to Assessing Model Complexity.","authors":"Yon Soo Suh, Wes Bonifay, Li Cai","doi":"10.1017/psy.2025.10017","DOIUrl":"10.1017/psy.2025.10017","url":null,"abstract":"<p><p>Fitting propensity (FP) analysis quantifies model complexity but has been impeded in item response theory (IRT) due to the computational infeasibility of uniformly and randomly sampling multinomial item response patterns under a full-information approach. We adopt a limited-information (LI) approach, wherein we generate data only up to the lower-order margins of the complete item response patterns. We present an algorithm that builds upon classical work on sampling contingency tables with fixed margins by implementing a Sequential Importance Sampling algorithm to Quickly and Uniformly Obtain Contingency tables (SISQUOC). Theoretical justification and comprehensive validation demonstrate the effectiveness of the SISQUOC algorithm for IRT and offer insights into sampling from the complete data space defined by the lower-order margins. We highlight the efficiency and simplicity of the LI approach for generating large and uniformly random datasets of dichotomous and polytomous items. We further present an iterative proportional fitting procedure to reconstruct joint multinomial probabilities after LI-based data generation, facilitating FP evaluation using traditional estimation strategies. We illustrate the proposed approach by examining the FP of the graded response model and generalized partial credit model, with results suggesting that their functional forms express similar degrees of configural complexity.</p>","PeriodicalId":54534,"journal":{"name":"Psychometrika","volume":" ","pages":"1-28"},"PeriodicalIF":3.1,"publicationDate":"2025-05-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12483694/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144112844","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Bi-factor analysis is a form of confirmatory factor analysis widely used in psychological and educational measurement. The use of a bi-factor model requires specifying an explicit bi-factor structure on the relationship between the observed variables and the group factors. In practice, the bi-factor structure is sometimes unknown, in which case, an exploratory form of bi-factor analysis is needed. Unfortunately, there are few methods for exploratory bi-factor analysis, with the exception of a rotation-based method proposed in Jennrich and Bentler ([2011, Psychometrika 76, pp. 537-549], [2012, Psychometrika 77, pp. 442-454]). However, the rotation method does not yield an exact bi-factor loading structure, even after hard thresholding. In this article, we propose a constraint-based optimization method that learns an exact bi-factor loading structure from data, overcoming the issue with the rotation-based method. The key to the proposed method is a mathematical characterization of the bi-factor loading structure as a set of equality constraints, which allows us to formulate the exploratory bi-factor analysis problem as a constrained optimization problem in a continuous domain and solve the optimization problem with an augmented Lagrangian method. The power of the proposed method is shown via simulation studies and a real data example.
双因素分析是验证性因素分析的一种形式,广泛应用于心理和教育测量。使用双因素模型需要在观察变量和组因素之间的关系上指定一个明确的双因素结构。在实践中,双因素结构有时是未知的,在这种情况下,需要一种探索性的双因素分析形式。不幸的是,除了jenrich和Bentler提出的基于旋转的方法([2011,Psychometrika 76, pp. 537-549], [2012, Psychometrika 77, pp. 442-454])之外,探索性双因素分析的方法很少。然而,旋转方法不能产生精确的双因素加载结构,即使在硬阈值之后也是如此。在本文中,我们提出了一种基于约束的优化方法,该方法从数据中学习精确的双因素加载结构,克服了基于旋转方法的问题。该方法的关键是将双因子加载结构的数学表征为一组等式约束,使我们能够将探索性双因子分析问题表述为连续域上的约束优化问题,并用增广拉格朗日方法求解优化问题。通过仿真研究和实际数据算例验证了该方法的有效性。
{"title":"Exact Exploratory Bi-factor Analysis: A Constraint-Based Optimization Approach.","authors":"Jiawei Qiao, Yunxiao Chen, Zhiliang Ying","doi":"10.1017/psy.2025.17","DOIUrl":"10.1017/psy.2025.17","url":null,"abstract":"<p><p>Bi-factor analysis is a form of confirmatory factor analysis widely used in psychological and educational measurement. The use of a bi-factor model requires specifying an explicit bi-factor structure on the relationship between the observed variables and the group factors. In practice, the bi-factor structure is sometimes unknown, in which case, an exploratory form of bi-factor analysis is needed. Unfortunately, there are few methods for exploratory bi-factor analysis, with the exception of a rotation-based method proposed in Jennrich and Bentler ([2011, Psychometrika 76, pp. 537-549], [2012, Psychometrika 77, pp. 442-454]). However, the rotation method does not yield an exact bi-factor loading structure, even after hard thresholding. In this article, we propose a constraint-based optimization method that learns an exact bi-factor loading structure from data, overcoming the issue with the rotation-based method. The key to the proposed method is a mathematical characterization of the bi-factor loading structure as a set of equality constraints, which allows us to formulate the exploratory bi-factor analysis problem as a constrained optimization problem in a continuous domain and solve the optimization problem with an augmented Lagrangian method. The power of the proposed method is shown via simulation studies and a real data example.</p>","PeriodicalId":54534,"journal":{"name":"Psychometrika","volume":" ","pages":"1-16"},"PeriodicalIF":3.1,"publicationDate":"2025-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12483720/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144082031","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Michael D Hunter, Robert M Kirkpatrick, Michael C Neale
With models and research designs ever increasing in complexity, the foundational question of model identification is more important than ever. The determination of whether or not a model can be fit at all or fit to some particular data set is the essence of model identification. In this article, we pull from previously published work on data-independent model identification applicable to a broad set of structural equation models, and extend it further to include extremely flexible exogenous covariate effects and also to include data-dependent empirical model identification. For illustrative purposes, we apply this model identification solution to several small examples for which the answer is already known, including a real data example from the National Longitudinal Survey of Youth; however, the method applies similarly to models that are far from simple to comprehend. The solution is implemented in the open-source OpenMx package in R.
{"title":"Show Me Some ID: A Universal Identification Program for Structural Equation Models.","authors":"Michael D Hunter, Robert M Kirkpatrick, Michael C Neale","doi":"10.1017/psy.2025.19","DOIUrl":"10.1017/psy.2025.19","url":null,"abstract":"<p><p>With models and research designs ever increasing in complexity, the foundational question of model identification is more important than ever. The determination of whether or not a model can be fit at all or fit to some particular data set is the essence of model identification. In this article, we pull from previously published work on data-independent model identification applicable to a broad set of structural equation models, and extend it further to include extremely flexible exogenous covariate effects and also to include data-dependent empirical model identification. For illustrative purposes, we apply this model identification solution to several small examples for which the answer is already known, including a real data example from the National Longitudinal Survey of Youth; however, the method applies similarly to models that are far from simple to comprehend. The solution is implemented in the open-source OpenMx package in R.</p>","PeriodicalId":54534,"journal":{"name":"Psychometrika","volume":" ","pages":"1-24"},"PeriodicalIF":3.1,"publicationDate":"2025-04-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12483695/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144050601","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Cognitive Diagnostic Models (CDMs) are popular discrete latent variable models in educational and psychological measurement. While existing CDMs mainly focus on binary or categorical responses, there is a growing need to extend them to model a wider range of response types, including but not limited to continuous and count-valued responses. Meanwhile, incorporating higher-order latent structures has become crucial for gaining deeper insights into cognitive processes. We propose a general modeling framework for higher-order CDMs for rich types of responses. Our framework features a highly flexible data layer that is adaptive to various response types and measurement models for CDMs. Importantly, we address a challenging exploratory estimation scenario where the item-attribute relationship, specified by the Q-matrix, is unknown and needs to be estimated along with other parameters. In the higher-order layer, we employ a probit-link with continuous latent traits to model the binary latent attributes, highlighting its benefits in terms of identifiability and computational efficiency. Theoretically, we propose transparent identifiability conditions for the exploratory setting. Computationally, we develop an efficient Monte Carlo Expectation-Maximization algorithm, which incorporates an efficient direct sampling scheme and requires significantly reduced simulated samples. Extensive simulation studies and a real data example demonstrate the effectiveness of our methodology.
{"title":"Exploratory General-Response Cognitive Diagnostic Models with Higher-Order Structures.","authors":"Jia Liu, Seunghyun Lee, Yuqi Gu","doi":"10.1017/psy.2025.15","DOIUrl":"10.1017/psy.2025.15","url":null,"abstract":"<p><p>Cognitive Diagnostic Models (CDMs) are popular discrete latent variable models in educational and psychological measurement. While existing CDMs mainly focus on binary or categorical responses, there is a growing need to extend them to model a wider range of response types, including but not limited to continuous and count-valued responses. Meanwhile, incorporating higher-order latent structures has become crucial for gaining deeper insights into cognitive processes. We propose a general modeling framework for higher-order CDMs for rich types of responses. Our framework features a highly flexible data layer that is adaptive to various response types and measurement models for CDMs. Importantly, we address a challenging exploratory estimation scenario where the item-attribute relationship, specified by the Q-matrix, is unknown and needs to be estimated along with other parameters. In the higher-order layer, we employ a probit-link with continuous latent traits to model the binary latent attributes, highlighting its benefits in terms of identifiability and computational efficiency. Theoretically, we propose transparent identifiability conditions for the exploratory setting. Computationally, we develop an efficient Monte Carlo Expectation-Maximization algorithm, which incorporates an efficient direct sampling scheme and requires significantly reduced simulated samples. Extensive simulation studies and a real data example demonstrate the effectiveness of our methodology.</p>","PeriodicalId":54534,"journal":{"name":"Psychometrika","volume":" ","pages":"1-28"},"PeriodicalIF":3.1,"publicationDate":"2025-04-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12805203/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144058676","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}