Advances in deep learning and representation learning have transformed item factor analysis (IFA) in the item response theory (IRT) literature by enabling more efficient and accurate parameter estimation. Variational autoencoders (VAEs) are widely used to model high-dimensional latent variables in this context, but the limited expressiveness of their inference networks can still hinder performance. We introduce adversarial variational Bayes (AVB) and an importance-weighted extension (IWAVB) as more flexible inference algorithms for IFA. By combining VAEs with generative adversarial networks (GANs), AVB uses an auxiliary discriminator network to frame estimation as a two-player game and removes the restrictive standard normal assumption on the latent variables. Theoretically, AVB and IWAVB can achieve likelihoods that match or exceed those of VAEs and importance-weighted autoencoders (IWAEs). In exploratory analyses of empirical data, IWAVB attained higher likelihoods than IWAE, indicating greater expressiveness. In confirmatory simulations, IWAVB achieved comparable mean-square error in parameter recovery while consistently yielding higher likelihoods, and it clearly outperformed IWAE when the latent distribution was multimodal. These findings suggest that IWAVB can scale IFA to complex, large-scale, and potentially multimodal settings, supporting closer integration of psychometrics with modern multimodal data analysis.
{"title":"Generative Adversarial Networks for High-Dimensional Item Factor Analysis: A Deep Adversarial Learning Algorithm.","authors":"Nanyu Luo, Feng Ji","doi":"10.1017/psy.2025.10059","DOIUrl":"10.1017/psy.2025.10059","url":null,"abstract":"<p><p>Advances in deep learning and representation learning have transformed item factor analysis (IFA) in the item response theory (IRT) literature by enabling more efficient and accurate parameter estimation. Variational autoencoders (VAEs) are widely used to model high-dimensional latent variables in this context, but the limited expressiveness of their inference networks can still hinder performance. We introduce adversarial variational Bayes (AVB) and an importance-weighted extension (IWAVB) as more flexible inference algorithms for IFA. By combining VAEs with generative adversarial networks (GANs), AVB uses an auxiliary discriminator network to frame estimation as a two-player game and removes the restrictive standard normal assumption on the latent variables. Theoretically, AVB and IWAVB can achieve likelihoods that match or exceed those of VAEs and importance-weighted autoencoders (IWAEs). In exploratory analyses of empirical data, IWAVB attained higher likelihoods than IWAE, indicating greater expressiveness. In confirmatory simulations, IWAVB achieved comparable mean-square error in parameter recovery while consistently yielding higher likelihoods, and it clearly outperformed IWAE when the latent distribution was multimodal. These findings suggest that IWAVB can scale IFA to complex, large-scale, and potentially multimodal settings, supporting closer integration of psychometrics with modern multimodal data analysis.</p>","PeriodicalId":54534,"journal":{"name":"Psychometrika","volume":" ","pages":"1-24"},"PeriodicalIF":3.1,"publicationDate":"2025-11-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12805202/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145490878","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
M Marsman, L J Waldorp, N Sekulovski, J M B Haslbeck
{"title":"Bayes Factor Tests for Group Differences in Ordinal and Binary Graphical Models.","authors":"M Marsman, L J Waldorp, N Sekulovski, J M B Haslbeck","doi":"10.1017/psy.2025.10060","DOIUrl":"10.1017/psy.2025.10060","url":null,"abstract":"","PeriodicalId":54534,"journal":{"name":"Psychometrika","volume":" ","pages":"1-49"},"PeriodicalIF":3.1,"publicationDate":"2025-11-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12805206/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145439986","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Cultural consensus theory (CCT) leverages shared knowledge between individuals to optimally aggregate answers to questions for which the underlying truth is unknown. Existing CCT models have predominantly focused on unidimensional point truths using dichotomous, polytomous, or continuous response formats. However, certain domains, such as risk assessment or interpretation of verbal quantifiers, may require a consensus focused on intervals, capturing a range of relevant values. We introduce the interval consensus model (ICM), a novel extension of CCT designed to estimate consensus intervals from continuous bounded interval responses. We use a Bayesian hierarchical modeling approach to estimate latent consensus intervals. In a simulation study, we show that, under the conditions studied, the ICM performs better than using simple means and medians of the responses. We then apply the model to empirical judgments of verbal quantifiers.
{"title":"The Interval Consensus Model: Aggregating Continuous Bounded Interval Responses.","authors":"Matthias Kloft, Björn S Siepe, Daniel W Heck","doi":"10.1017/psy.2025.10058","DOIUrl":"10.1017/psy.2025.10058","url":null,"abstract":"<p><p>Cultural consensus theory (CCT) leverages shared knowledge between individuals to optimally aggregate answers to questions for which the underlying truth is unknown. Existing CCT models have predominantly focused on unidimensional point truths using dichotomous, polytomous, or continuous response formats. However, certain domains, such as risk assessment or interpretation of verbal quantifiers, may require a consensus focused on intervals, capturing a range of relevant values. We introduce the interval consensus model (ICM), a novel extension of CCT designed to estimate consensus intervals from continuous bounded interval responses. We use a Bayesian hierarchical modeling approach to estimate latent consensus intervals. In a simulation study, we show that, under the conditions studied, the ICM performs better than using simple means and medians of the responses. We then apply the model to empirical judgments of verbal quantifiers.</p>","PeriodicalId":54534,"journal":{"name":"Psychometrika","volume":" ","pages":"1-24"},"PeriodicalIF":3.1,"publicationDate":"2025-11-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145440013","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Wataru Urasaki, Tomoyuki Nakagawa, Jun Tsuchida, Kouji Tahata
When the row and column variables consist of the same category in a two-way contingency table, it is called a square contingency table. Since square contingency tables have an association structure due to the concentration of observed values near the main diagonal, a primary objective is to examine symmetric relationships and transitions between variables. Various models and measures have been proposed to analyze these structures to understand the changes between two variables' behavior at two-time points or cohorts. This is necessary for a detailed investigation of individual categories and their interrelationships, such as shifts in brand preferences. We propose a novel approach to correspondence analysis (CA) for evaluating departures from symmetry in square contingency tables with nominal categories, using a modified divergence statistic. This approach ensures that well-known divergence statistics can also be visualized and regardless of the divergence statistics used, the CA plot consists of two principal axes with equal contribution rates. Notably, the scaling of the departures from symmetry provided by the modified divergence statistic is independent of sample size, allowing for meaningful comparisons and unification of results across different tables. Confidence regions are also constructed to enhance the accuracy of the CA plot.
{"title":"Visualization for Departures from Symmetry with the Power-Divergence-Type Measure in Square Contingency Tables.","authors":"Wataru Urasaki, Tomoyuki Nakagawa, Jun Tsuchida, Kouji Tahata","doi":"10.1017/psy.2025.10057","DOIUrl":"10.1017/psy.2025.10057","url":null,"abstract":"<p><p>When the row and column variables consist of the same category in a two-way contingency table, it is called a square contingency table. Since square contingency tables have an association structure due to the concentration of observed values near the main diagonal, a primary objective is to examine symmetric relationships and transitions between variables. Various models and measures have been proposed to analyze these structures to understand the changes between two variables' behavior at two-time points or cohorts. This is necessary for a detailed investigation of individual categories and their interrelationships, such as shifts in brand preferences. We propose a novel approach to correspondence analysis (CA) for evaluating departures from symmetry in square contingency tables with nominal categories, using a modified divergence statistic. This approach ensures that well-known divergence statistics can also be visualized and regardless of the divergence statistics used, the CA plot consists of two principal axes with equal contribution rates. Notably, the scaling of the departures from symmetry provided by the modified divergence statistic is independent of sample size, allowing for meaningful comparisons and unification of results across different tables. Confidence regions are also constructed to enhance the accuracy of the CA plot.</p>","PeriodicalId":54534,"journal":{"name":"Psychometrika","volume":" ","pages":"1-20"},"PeriodicalIF":3.1,"publicationDate":"2025-11-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12805207/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145432909","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"MODELING MISSING AT RANDOM NEUROPSYCHOLOGICAL TEST SCORES USING A MIXTURE OF BINOMIAL PRODUCT EXPERTS.","authors":"Daniel Suen, Yen-Chi Chen","doi":"10.1017/psy.2025.10053","DOIUrl":"https://doi.org/10.1017/psy.2025.10053","url":null,"abstract":"","PeriodicalId":54534,"journal":{"name":"Psychometrika","volume":" ","pages":"1-74"},"PeriodicalIF":3.1,"publicationDate":"2025-10-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145349858","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Martin Papenberg, Martin Breuer, Max Diekhoff, Nguyen K Tran, Gunnar W Klau
{"title":"Extending the Bicriterion Approach for Anticlustering: Exact and Hybrid Approaches.","authors":"Martin Papenberg, Martin Breuer, Max Diekhoff, Nguyen K Tran, Gunnar W Klau","doi":"10.1017/psy.2025.10052","DOIUrl":"10.1017/psy.2025.10052","url":null,"abstract":"","PeriodicalId":54534,"journal":{"name":"Psychometrika","volume":" ","pages":"1-43"},"PeriodicalIF":3.1,"publicationDate":"2025-10-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12805193/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145240242","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Reliability analysis is one of the most conducted analyses in applied psychometrics. It entails the assessment of reliability of both item scores and scale scores using coefficients that estimate the reliability (e.g., Cronbach's alpha), measurement precision (e.g., estimated standard error of measurement), or the contribution of individual items to the reliability (e.g., corrected item-total correlations). Most statistical software packages used in social and behavioral sciences offer these reliability coefficients, whereas standard errors are generally unavailable, which is a bit ironic for coefficients about measurement precision. This article provides analytic nonparametric standard errors for coefficients used in reliability analysis. As most scores used in behavioral sciences are discrete, standard errors are derived under the relatively unrestrictive multinomial sampling scheme. Tedious derivations are presented in appendices, and R functions for computing standard errors are available from the Open Science Framework. Bias and variance of standard errors, and coverage of the corresponding Wald-based confidence intervals are studied using simulated item scores. Bias and variance, and coverage are generally satisfactory for larger sample sizes, and parameter values are not close to the boundary of the parameter space.
{"title":"Standard Errors for Reliability Coefficients.","authors":"L Andries van der Ark","doi":"10.1017/psy.2025.10050","DOIUrl":"10.1017/psy.2025.10050","url":null,"abstract":"<p><p>Reliability analysis is one of the most conducted analyses in applied psychometrics. It entails the assessment of reliability of both item scores and scale scores using coefficients that estimate the reliability (e.g., Cronbach's alpha), measurement precision (e.g., estimated standard error of measurement), or the contribution of individual items to the reliability (e.g., corrected item-total correlations). Most statistical software packages used in social and behavioral sciences offer these reliability coefficients, whereas standard errors are generally unavailable, which is a bit ironic for coefficients about measurement precision. This article provides analytic nonparametric standard errors for coefficients used in reliability analysis. As most scores used in behavioral sciences are discrete, standard errors are derived under the relatively unrestrictive multinomial sampling scheme. Tedious derivations are presented in appendices, and R functions for computing standard errors are available from the Open Science Framework. Bias and variance of standard errors, and coverage of the corresponding Wald-based confidence intervals are studied using simulated item scores. Bias and variance, and coverage are generally satisfactory for larger sample sizes, and parameter values are not close to the boundary of the parameter space.</p>","PeriodicalId":54534,"journal":{"name":"Psychometrika","volume":" ","pages":"1-26"},"PeriodicalIF":3.1,"publicationDate":"2025-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12805205/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145193499","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Valerii Dashuk, Martin Hecht, Oliver Lüdtke, Alexander Robitzsch, Steffen Zitzmann
{"title":"An Optimally Regularized Estimator of Multilevel Latent Variable Models, with Improved MSE Performance.","authors":"Valerii Dashuk, Martin Hecht, Oliver Lüdtke, Alexander Robitzsch, Steffen Zitzmann","doi":"10.1017/psy.2025.10045","DOIUrl":"10.1017/psy.2025.10045","url":null,"abstract":"","PeriodicalId":54534,"journal":{"name":"Psychometrika","volume":" ","pages":"1-75"},"PeriodicalIF":3.1,"publicationDate":"2025-09-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12805209/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145114795","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Obituary Robert J. Mislevy (1950-2025).","authors":"Roy Levy, Russell G Almond","doi":"10.1017/psy.2025.10049","DOIUrl":"https://doi.org/10.1017/psy.2025.10049","url":null,"abstract":"","PeriodicalId":54534,"journal":{"name":"Psychometrika","volume":" ","pages":"1-12"},"PeriodicalIF":3.1,"publicationDate":"2025-09-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145066451","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Differential item functioning (DIF) screening has long been suggested to ensure assessment fairness. Traditional DIF methods typically focus on the main effects of demographic variables on item parameters, overlooking the interactions among multiple identities. Drawing on the intersectionality framework, we define intersectional DIF as deviations in item parameters that arise from the interactions among demographic variables beyond their main effects and propose a novel item response theory (IRT) approach for detecting intersectional DIF. Under our framework, fixed effects are used to account for traditional DIF, while random item effects are introduced to capture intersectional DIF. We further introduce the concept of intersectional impact, which refers to interaction effects on group-level mean ability. Depending on which item parameters are affected and whether intersectional impact is considered, we propose four models, which aim to detect intersectional uniform DIF (UDIF), intersectional UDIF with intersectional impact, intersectional non-uniform DIF (NUDIF), and intersectional NUDIF with intersectional impact, respectively. For efficient model estimation, a regularized Gaussian variational expectation-maximization algorithm is developed. Simulation studies demonstrate that our methods can effectively detect intersectional UDIF, although their detection of intersectional NUDIF is more limited.
{"title":"A Novel Method for Detecting Intersectional DIF: Multilevel Random Item Effects Model with Regularized Gaussian Variational Estimation.","authors":"He Ren, Weicong Lyu, Chun Wang, Gongjun Xu","doi":"10.1017/psy.2025.10046","DOIUrl":"10.1017/psy.2025.10046","url":null,"abstract":"<p><p>Differential item functioning (DIF) screening has long been suggested to ensure assessment fairness. Traditional DIF methods typically focus on the main effects of demographic variables on item parameters, overlooking the interactions among multiple identities. Drawing on the intersectionality framework, we define intersectional DIF as deviations in item parameters that arise from the interactions among demographic variables beyond their main effects and propose a novel item response theory (IRT) approach for detecting intersectional DIF. Under our framework, fixed effects are used to account for traditional DIF, while random item effects are introduced to capture intersectional DIF. We further introduce the concept of intersectional impact, which refers to interaction effects on group-level mean ability. Depending on which item parameters are affected and whether intersectional impact is considered, we propose four models, which aim to detect intersectional uniform DIF (UDIF), intersectional UDIF with intersectional impact, intersectional non-uniform DIF (NUDIF), and intersectional NUDIF with intersectional impact, respectively. For efficient model estimation, a regularized Gaussian variational expectation-maximization algorithm is developed. Simulation studies demonstrate that our methods can effectively detect intersectional UDIF, although their detection of intersectional NUDIF is more limited.</p>","PeriodicalId":54534,"journal":{"name":"Psychometrika","volume":" ","pages":"1-25"},"PeriodicalIF":3.1,"publicationDate":"2025-09-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145066440","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}