Test fairness is a major concern in psychometric and educational research. A typical approach for ensuring test fairness is through differential item functioning (DIF) analysis. DIF arises when a test item functions differently across subgroups that are typically defined by the respondents' demographic characteristics. Most of the existing research focuses on the statistical detection of DIF, yet less attention has been given to reducing or eliminating DIF. Simultaneously, the use of computer-based assessments has become increasingly popular. The data obtained from respondents interacting with an item are recorded in computer log files and are referred to as process data. In this article, we propose a novel method within the framework of generalized linear models that leverages process data to reduce and understand DIF. Specifically, we construct a nuisance trait surrogate with the features extracted from process data. With the constructed nuisance trait, we introduce a new scoring rule that incorporates respondents' behaviors captured through process data on top of the target latent trait. We demonstrate the efficiency of our approach through extensive simulation experiments and an application to 13 Problem Solving in Technology-Rich Environments items from the 2012 Programme for the International Assessment of Adult Competencies assessment.
{"title":"Reducing Differential Item Functioning via Process Data.","authors":"Ling Chen, Susu Zhang, Jingchen Liu","doi":"10.1017/psy.2025.10072","DOIUrl":"10.1017/psy.2025.10072","url":null,"abstract":"<p><p>Test fairness is a major concern in psychometric and educational research. A typical approach for ensuring test fairness is through differential item functioning (DIF) analysis. DIF arises when a test item functions differently across subgroups that are typically defined by the respondents' demographic characteristics. Most of the existing research focuses on the statistical detection of DIF, yet less attention has been given to reducing or eliminating DIF. Simultaneously, the use of computer-based assessments has become increasingly popular. The data obtained from respondents interacting with an item are recorded in computer log files and are referred to as process data. In this article, we propose a novel method within the framework of generalized linear models that leverages process data to reduce and understand DIF. Specifically, we construct a nuisance trait surrogate with the features extracted from process data. With the constructed nuisance trait, we introduce a new scoring rule that incorporates respondents' behaviors captured through process data on top of the target latent trait. We demonstrate the efficiency of our approach through extensive simulation experiments and an application to 13 Problem Solving in Technology-Rich Environments items from the 2012 Programme for the International Assessment of Adult Competencies assessment.</p>","PeriodicalId":54534,"journal":{"name":"Psychometrika","volume":" ","pages":"1-36"},"PeriodicalIF":3.1,"publicationDate":"2025-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145716679","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Michael Fauss, Xiang Liu, Chen Li, Ikkyu Choi, H Vincent Poor
This article investigates the problem of automatically flagging test takers who exhibit atypical responses or behaviors for further review by human experts. The objective is to develop a selection policy that maximizes the expected number of test takers correctly identified as warranting additional scrutiny while maintaining a manageable volume of reviews per test administration. The selection procedure should learn from the outcomes of the expert reviews. Since typically only a fraction of test takers are reviewed, this leads to a semi-supervised learning problem. The latter is formalized in a Bayesian setting, and the corresponding optimal selection policy is derived. Since calculating the policy and the underlying posterior distributions is computationally infeasible, a variational approximation and three heuristic selection policies are proposed. These policies are informed by properties of the optimal policy and correspond to different exploration/exploitation trade-offs. The performance of the approximate policies is assessed via numerical experiments using both synthetic and real-world data and is compared with procedures based on off-the-shelf algorithms as well as theoretical performance bounds.
{"title":"Bayesian Selection Policies for Human-in-the-Loop Anomaly Detectors with Applications in Test Security.","authors":"Michael Fauss, Xiang Liu, Chen Li, Ikkyu Choi, H Vincent Poor","doi":"10.1017/psy.2025.10056","DOIUrl":"10.1017/psy.2025.10056","url":null,"abstract":"<p><p>This article investigates the problem of automatically flagging test takers who exhibit atypical responses or behaviors for further review by human experts. The objective is to develop a selection policy that maximizes the expected number of test takers correctly identified as warranting additional scrutiny while maintaining a manageable volume of reviews per test administration. The selection procedure should learn from the outcomes of the expert reviews. Since typically only a fraction of test takers are reviewed, this leads to a semi-supervised learning problem. The latter is formalized in a Bayesian setting, and the corresponding optimal selection policy is derived. Since calculating the policy and the underlying posterior distributions is computationally infeasible, a variational approximation and three heuristic selection policies are proposed. These policies are informed by properties of the optimal policy and correspond to different exploration/exploitation trade-offs. The performance of the approximate policies is assessed via numerical experiments using both synthetic and real-world data and is compared with procedures based on off-the-shelf algorithms as well as theoretical performance bounds.</p>","PeriodicalId":54534,"journal":{"name":"Psychometrika","volume":" ","pages":"1-33"},"PeriodicalIF":3.1,"publicationDate":"2025-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145716767","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Identifying causal directions among variables via data-driven approaches is a research hotspot. Researchers now focus on detecting causal direction heterogeneity among multiple variables (variables more than two) when covariates cause such heterogeneity. This study combines the structural equation likelihood function (SELF) method with a recursive partitioning method to achieve an interpretable model of multivariate causal direction heterogeneity in multivariable settings. Through simulation, we compared the performance of the SELF-Tree model in terms of the identification about heterogeneous causal direction under different conditions. Using a public drug consumption dataset, we demonstrated its real data application. The SELF-Tree model offers researchers a new way to understand variable causal direction heterogeneity.
{"title":"SELF-Tree: An Interpretable Model for Multivariate Causal Direction Heterogeneity Analysis.","authors":"Zhifei Li, Hongbo Wen","doi":"10.1017/psy.2025.10067","DOIUrl":"10.1017/psy.2025.10067","url":null,"abstract":"<p><p>Identifying causal directions among variables via data-driven approaches is a research hotspot. Researchers now focus on detecting causal direction heterogeneity among multiple variables (variables more than two) when covariates cause such heterogeneity. This study combines the structural equation likelihood function (SELF) method with a recursive partitioning method to achieve an interpretable model of multivariate causal direction heterogeneity in multivariable settings. Through simulation, we compared the performance of the SELF-Tree model in terms of the identification about heterogeneous causal direction under different conditions. Using a public drug consumption dataset, we demonstrated its real data application. The SELF-Tree model offers researchers a new way to understand variable causal direction heterogeneity.</p>","PeriodicalId":54534,"journal":{"name":"Psychometrika","volume":" ","pages":"1-31"},"PeriodicalIF":3.1,"publicationDate":"2025-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145716708","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"The bit scale: A metric score scale for unidimensional item response theory models.","authors":"Joakim Wallmark, Marie Wiberg","doi":"10.1017/psy.2025.10071","DOIUrl":"https://doi.org/10.1017/psy.2025.10071","url":null,"abstract":"","PeriodicalId":54534,"journal":{"name":"Psychometrika","volume":" ","pages":"1-32"},"PeriodicalIF":3.1,"publicationDate":"2025-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145716765","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In educational testing, inferences of ability have been mainly based on item responses, while the time taken to complete an item is often ignored. To better infer the ability, a new class of state space models, which conjointly model response time with time series of dichotomous responses, is developed. Simulations for the proposed models demonstrate that the biases of ability estimation are reduced as well as the precisions of ability estimation are improved. An empirical study is conducted using EdSphere datasets, where the two competing relationships (i.e., monotone and inverted U-shape) for the distance between ability and difficulty are investigated in modeling response times. The results of model comparison support that the inverted U-shape relationship better captures the behaviors and psychology of examinees in exams for EdSphere datasets.
{"title":"Bayesian Joint Modeling of Response Times with Dynamic Latent Ability in Educational Testing.","authors":"Xiaojing Wang, Abhisek Saha, Dipak K Dey","doi":"10.1017/psy.2025.10019","DOIUrl":"https://doi.org/10.1017/psy.2025.10019","url":null,"abstract":"<p><p>In educational testing, inferences of ability have been mainly based on item responses, while the time taken to complete an item is often ignored. To better infer the ability, a new class of state space models, which conjointly model response time with time series of dichotomous responses, is developed. Simulations for the proposed models demonstrate that the biases of ability estimation are reduced as well as the precisions of ability estimation are improved. An empirical study is conducted using EdSphere datasets, where the two competing relationships (i.e., monotone and inverted U-shape) for the distance between ability and difficulty are investigated in modeling response times. The results of model comparison support that the inverted U-shape relationship better captures the behaviors and psychology of examinees in exams for EdSphere datasets.</p>","PeriodicalId":54534,"journal":{"name":"Psychometrika","volume":" ","pages":"1-23"},"PeriodicalIF":3.1,"publicationDate":"2025-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145656409","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Polychoric correlation is often an important building block in the analysis of rating data, particularly for structural equation models. However, the commonly employed maximum likelihood (ML) estimator is highly susceptible to misspecification of the polychoric correlation model, for instance, through violations of latent normality assumptions. We propose a novel estimator that is designed to be robust against partial misspecification of the polychoric model, that is, when the model is misspecified for an unknown fraction of observations, such as careless respondents. To this end, the estimator minimizes a robust loss function based on the divergence between observed frequencies and theoretical frequencies implied by the polychoric model. In contrast to existing literature, our estimator makes no assumption on the type or degree of model misspecification. It furthermore generalizes ML estimation, is consistent as well as asymptotically normally distributed, and comes at no additional computational cost. We demonstrate the robustness and practical usefulness of our estimator in simulation studies and an empirical application on a Big Five administration. In the latter, the polychoric correlation estimates of our estimator and ML differ substantially, which, after further inspection, is likely due to the presence of careless respondents that the estimator helps identify.
{"title":"Robust Estimation of Polychoric Correlation.","authors":"Max Welz, Patrick Mair, Andreas Alfons","doi":"10.1017/psy.2025.10066","DOIUrl":"10.1017/psy.2025.10066","url":null,"abstract":"<p><p>Polychoric correlation is often an important building block in the analysis of rating data, particularly for structural equation models. However, the commonly employed maximum likelihood (ML) estimator is highly susceptible to misspecification of the polychoric correlation model, for instance, through violations of latent normality assumptions. We propose a novel estimator that is designed to be robust against partial misspecification of the polychoric model, that is, when the model is misspecified for an unknown fraction of observations, such as careless respondents. To this end, the estimator minimizes a robust loss function based on the divergence between observed frequencies and theoretical frequencies implied by the polychoric model. In contrast to existing literature, our estimator makes no assumption on the type or degree of model misspecification. It furthermore generalizes ML estimation, is consistent as well as asymptotically normally distributed, and comes at no additional computational cost. We demonstrate the robustness and practical usefulness of our estimator in simulation studies and an empirical application on a Big Five administration. In the latter, the polychoric correlation estimates of our estimator and ML differ substantially, which, after further inspection, is likely due to the presence of careless respondents that the estimator helps identify.</p>","PeriodicalId":54534,"journal":{"name":"Psychometrika","volume":" ","pages":"1-32"},"PeriodicalIF":3.1,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145650157","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this paper, we generalize the multidimensional discrimination and difficulty parameters in the multidimensional two-parameter logistic model to account for nonidentity latent covariances and negatively keyed items. We apply Reckase's maximum discrimination point method to define them in an arbitrary algebraic basis. Then, we define that basis to be a geometrical representation of the measured construct. This results in three different versions of the parameters: the original one, based on the item parameters solely; one that incorporates the covariance structure of the latent space; and one that uses the correlation structure instead. Importantly, we find that the items should be properly represented in a test space, distinct from the latent space. We also provide a procedure for the geometrical representation of the items in the test space and apply our results to examples from the literature to get a more accurate representation of the measurement properties of the items. We recommend using the covariance structure version for describing the properties of the parameters and the correlation structure version for graphical representation. Finally, we discuss the implications of this generalization for other multidimensional item response theory models and the parallels of our results in common factor model theory.
{"title":"A Generalized Definition of Multidimensional Item Response Theory Parameters.","authors":"Daniel Morillo-Cuadrado, Mario Luzardo-Verde","doi":"10.1017/psy.2025.10063","DOIUrl":"10.1017/psy.2025.10063","url":null,"abstract":"<p><p>In this paper, we generalize the multidimensional discrimination and difficulty parameters in the multidimensional two-parameter logistic model to account for nonidentity latent covariances and negatively keyed items. We apply Reckase's maximum discrimination point method to define them in an arbitrary algebraic basis. Then, we define that basis to be a geometrical representation of the measured construct. This results in three different versions of the parameters: the original one, based on the item parameters solely; one that incorporates the covariance structure of the latent space; and one that uses the correlation structure instead. Importantly, we find that the items should be properly represented in a test space, distinct from the latent space. We also provide a procedure for the geometrical representation of the items in the test space and apply our results to examples from the literature to get a more accurate representation of the measurement properties of the items. We recommend using the covariance structure version for describing the properties of the parameters and the correlation structure version for graphical representation. Finally, we discuss the implications of this generalization for other multidimensional item response theory models and the parallels of our results in common factor model theory.</p>","PeriodicalId":54534,"journal":{"name":"Psychometrika","volume":" ","pages":"1-23"},"PeriodicalIF":3.1,"publicationDate":"2025-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145551540","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Andrea Brancaccio, Debora de Chiusole, Ottavia M Epifania, Pasquale Anselmi, Matilde Spinoso, Noemi Mazzoni, Alice Bacherini, Matteo Orsoni, Sara Giovagnoli, Irene Pierluigi, Mariagrazia Benassi, Giulia Balboni, Luca Stefanutti
Tower tasks are popular tools used to measure planning skills. The sequences of moves undertaken by the respondents in solving tower tasks might provide important and useful information to shed light on their planning skills. The article focuses on the distinction between a situation where planning occurs before action (pre-planning) from one where planning and action are interlaced all along the execution of the task (interim-planning). While the model for pre-planning was already developed by Stefanutti et al. (2021), an alternative model for the interim-planning is proposed. The two models are compared with one another in an empirical study. In accordance with the literature on the development of planning skills, the pre-planning model better fits data collected on individuals aged 14 on, while the interim-planning model displays a better fit with data collected on individuals aged 4-8. This result is further corroborated by the analysis of the time performance.
{"title":"Two Markov Solution Process Models for the Assessment of Planning in Problem Solving.","authors":"Andrea Brancaccio, Debora de Chiusole, Ottavia M Epifania, Pasquale Anselmi, Matilde Spinoso, Noemi Mazzoni, Alice Bacherini, Matteo Orsoni, Sara Giovagnoli, Irene Pierluigi, Mariagrazia Benassi, Giulia Balboni, Luca Stefanutti","doi":"10.1017/psy.2025.10042","DOIUrl":"https://doi.org/10.1017/psy.2025.10042","url":null,"abstract":"<p><p>Tower tasks are popular tools used to measure planning skills. The sequences of moves undertaken by the respondents in solving tower tasks might provide important and useful information to shed light on their planning skills. The article focuses on the distinction between a situation where planning occurs before action (pre-planning) from one where planning and action are interlaced all along the execution of the task (interim-planning). While the model for pre-planning was already developed by Stefanutti et al. (2021), an alternative model for the interim-planning is proposed. The two models are compared with one another in an empirical study. In accordance with the literature on the development of planning skills, the pre-planning model better fits data collected on individuals aged 14 on, while the interim-planning model displays a better fit with data collected on individuals aged 4-8. This result is further corroborated by the analysis of the time performance.</p>","PeriodicalId":54534,"journal":{"name":"Psychometrika","volume":" ","pages":"1-31"},"PeriodicalIF":3.1,"publicationDate":"2025-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145507979","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A ranking pattern approach is proposed to build item response theory (IRT) models for forced-choice (FC) items. This new approach is an addition to the two existing approaches, sequential selection and Thurstone's law of pairwise comparison. A new dominance IRT model, the multidimensional generalized partial preference model (MGPPM), is proposed for FC items with any number (greater than 1) of statements. The maximum marginal likelihood estimation using an expectation-maximization algorithm (MML-EM) and Markov chain Monte Carlo (MCMC) estimation are developed. A simulation study is conducted to show satisfactory parameter recovery on triplet and tetrad data. The relationships between the newly proposed approach/model and the existing approaches/models are described, and the MGPPM, Thurstonian IRT (TIRT) model, and Triplet-2PLM are compared when applied to simulated and real triplet data. The new approach offers more flexible IRT modeling than the other two approaches under different assumptions, and the MGPPM is more statistically elegant than the TIRT and Triple-2PLM.
{"title":"Multidimensional Generalized Partial Preference Model for Forced-Choice Items.","authors":"Daniel C Furr, Jianbin Fu","doi":"10.1017/psy.2025.10054","DOIUrl":"10.1017/psy.2025.10054","url":null,"abstract":"<p><p>A ranking pattern approach is proposed to build item response theory (IRT) models for forced-choice (FC) items. This new approach is an addition to the two existing approaches, sequential selection and Thurstone's law of pairwise comparison. A new dominance IRT model, the multidimensional generalized partial preference model (MGPPM), is proposed for FC items with any number (greater than 1) of statements. The maximum marginal likelihood estimation using an expectation-maximization algorithm (MML-EM) and Markov chain Monte Carlo (MCMC) estimation are developed. A simulation study is conducted to show satisfactory parameter recovery on triplet and tetrad data. The relationships between the newly proposed approach/model and the existing approaches/models are described, and the MGPPM, Thurstonian IRT (TIRT) model, and Triplet-2PLM are compared when applied to simulated and real triplet data. The new approach offers more flexible IRT modeling than the other two approaches under different assumptions, and the MGPPM is more statistically elegant than the TIRT and Triple-2PLM.</p>","PeriodicalId":54534,"journal":{"name":"Psychometrika","volume":" ","pages":"1-30"},"PeriodicalIF":3.1,"publicationDate":"2025-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12805200/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145508067","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Advances in deep learning and representation learning have transformed item factor analysis (IFA) in the item response theory (IRT) literature by enabling more efficient and accurate parameter estimation. Variational autoencoders (VAEs) are widely used to model high-dimensional latent variables in this context, but the limited expressiveness of their inference networks can still hinder performance. We introduce adversarial variational Bayes (AVB) and an importance-weighted extension (IWAVB) as more flexible inference algorithms for IFA. By combining VAEs with generative adversarial networks (GANs), AVB uses an auxiliary discriminator network to frame estimation as a two-player game and removes the restrictive standard normal assumption on the latent variables. Theoretically, AVB and IWAVB can achieve likelihoods that match or exceed those of VAEs and importance-weighted autoencoders (IWAEs). In exploratory analyses of empirical data, IWAVB attained higher likelihoods than IWAE, indicating greater expressiveness. In confirmatory simulations, IWAVB achieved comparable mean-square error in parameter recovery while consistently yielding higher likelihoods, and it clearly outperformed IWAE when the latent distribution was multimodal. These findings suggest that IWAVB can scale IFA to complex, large-scale, and potentially multimodal settings, supporting closer integration of psychometrics with modern multimodal data analysis.
{"title":"Generative Adversarial Networks for High-Dimensional Item Factor Analysis: A Deep Adversarial Learning Algorithm.","authors":"Nanyu Luo, Feng Ji","doi":"10.1017/psy.2025.10059","DOIUrl":"10.1017/psy.2025.10059","url":null,"abstract":"<p><p>Advances in deep learning and representation learning have transformed item factor analysis (IFA) in the item response theory (IRT) literature by enabling more efficient and accurate parameter estimation. Variational autoencoders (VAEs) are widely used to model high-dimensional latent variables in this context, but the limited expressiveness of their inference networks can still hinder performance. We introduce adversarial variational Bayes (AVB) and an importance-weighted extension (IWAVB) as more flexible inference algorithms for IFA. By combining VAEs with generative adversarial networks (GANs), AVB uses an auxiliary discriminator network to frame estimation as a two-player game and removes the restrictive standard normal assumption on the latent variables. Theoretically, AVB and IWAVB can achieve likelihoods that match or exceed those of VAEs and importance-weighted autoencoders (IWAEs). In exploratory analyses of empirical data, IWAVB attained higher likelihoods than IWAE, indicating greater expressiveness. In confirmatory simulations, IWAVB achieved comparable mean-square error in parameter recovery while consistently yielding higher likelihoods, and it clearly outperformed IWAE when the latent distribution was multimodal. These findings suggest that IWAVB can scale IFA to complex, large-scale, and potentially multimodal settings, supporting closer integration of psychometrics with modern multimodal data analysis.</p>","PeriodicalId":54534,"journal":{"name":"Psychometrika","volume":" ","pages":"1-24"},"PeriodicalIF":3.1,"publicationDate":"2025-11-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12805202/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145490878","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}