{"title":"The bit scale: A metric score scale for unidimensional item response theory models.","authors":"Joakim Wallmark, Marie Wiberg","doi":"10.1017/psy.2025.10071","DOIUrl":"https://doi.org/10.1017/psy.2025.10071","url":null,"abstract":"","PeriodicalId":54534,"journal":{"name":"Psychometrika","volume":" ","pages":"1-32"},"PeriodicalIF":3.1,"publicationDate":"2025-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145716765","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In educational testing, inferences of ability have been mainly based on item responses, while the time taken to complete an item is often ignored. To better infer the ability, a new class of state space models, which conjointly model response time with time series of dichotomous responses, is developed. Simulations for the proposed models demonstrate that the biases of ability estimation are reduced as well as the precisions of ability estimation are improved. An empirical study is conducted using EdSphere datasets, where the two competing relationships (i.e., monotone and inverted U-shape) for the distance between ability and difficulty are investigated in modeling response times. The results of model comparison support that the inverted U-shape relationship better captures the behaviors and psychology of examinees in exams for EdSphere datasets.
{"title":"Bayesian Joint Modeling of Response Times with Dynamic Latent Ability in Educational Testing.","authors":"Xiaojing Wang, Abhisek Saha, Dipak K Dey","doi":"10.1017/psy.2025.10019","DOIUrl":"https://doi.org/10.1017/psy.2025.10019","url":null,"abstract":"<p><p>In educational testing, inferences of ability have been mainly based on item responses, while the time taken to complete an item is often ignored. To better infer the ability, a new class of state space models, which conjointly model response time with time series of dichotomous responses, is developed. Simulations for the proposed models demonstrate that the biases of ability estimation are reduced as well as the precisions of ability estimation are improved. An empirical study is conducted using EdSphere datasets, where the two competing relationships (i.e., monotone and inverted U-shape) for the distance between ability and difficulty are investigated in modeling response times. The results of model comparison support that the inverted U-shape relationship better captures the behaviors and psychology of examinees in exams for EdSphere datasets.</p>","PeriodicalId":54534,"journal":{"name":"Psychometrika","volume":" ","pages":"1-23"},"PeriodicalIF":3.1,"publicationDate":"2025-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145656409","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Polychoric correlation is often an important building block in the analysis of rating data, particularly for structural equation models. However, the commonly employed maximum likelihood (ML) estimator is highly susceptible to misspecification of the polychoric correlation model, for instance, through violations of latent normality assumptions. We propose a novel estimator that is designed to be robust against partial misspecification of the polychoric model, that is, when the model is misspecified for an unknown fraction of observations, such as careless respondents. To this end, the estimator minimizes a robust loss function based on the divergence between observed frequencies and theoretical frequencies implied by the polychoric model. In contrast to existing literature, our estimator makes no assumption on the type or degree of model misspecification. It furthermore generalizes ML estimation, is consistent as well as asymptotically normally distributed, and comes at no additional computational cost. We demonstrate the robustness and practical usefulness of our estimator in simulation studies and an empirical application on a Big Five administration. In the latter, the polychoric correlation estimates of our estimator and ML differ substantially, which, after further inspection, is likely due to the presence of careless respondents that the estimator helps identify.
{"title":"Robust Estimation of Polychoric Correlation.","authors":"Max Welz, Patrick Mair, Andreas Alfons","doi":"10.1017/psy.2025.10066","DOIUrl":"10.1017/psy.2025.10066","url":null,"abstract":"<p><p>Polychoric correlation is often an important building block in the analysis of rating data, particularly for structural equation models. However, the commonly employed maximum likelihood (ML) estimator is highly susceptible to misspecification of the polychoric correlation model, for instance, through violations of latent normality assumptions. We propose a novel estimator that is designed to be robust against partial misspecification of the polychoric model, that is, when the model is misspecified for an unknown fraction of observations, such as careless respondents. To this end, the estimator minimizes a robust loss function based on the divergence between observed frequencies and theoretical frequencies implied by the polychoric model. In contrast to existing literature, our estimator makes no assumption on the type or degree of model misspecification. It furthermore generalizes ML estimation, is consistent as well as asymptotically normally distributed, and comes at no additional computational cost. We demonstrate the robustness and practical usefulness of our estimator in simulation studies and an empirical application on a Big Five administration. In the latter, the polychoric correlation estimates of our estimator and ML differ substantially, which, after further inspection, is likely due to the presence of careless respondents that the estimator helps identify.</p>","PeriodicalId":54534,"journal":{"name":"Psychometrika","volume":" ","pages":"1-32"},"PeriodicalIF":3.1,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145650157","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this paper, we generalize the multidimensional discrimination and difficulty parameters in the multidimensional two-parameter logistic model to account for nonidentity latent covariances and negatively keyed items. We apply Reckase's maximum discrimination point method to define them in an arbitrary algebraic basis. Then, we define that basis to be a geometrical representation of the measured construct. This results in three different versions of the parameters: the original one, based on the item parameters solely; one that incorporates the covariance structure of the latent space; and one that uses the correlation structure instead. Importantly, we find that the items should be properly represented in a test space, distinct from the latent space. We also provide a procedure for the geometrical representation of the items in the test space and apply our results to examples from the literature to get a more accurate representation of the measurement properties of the items. We recommend using the covariance structure version for describing the properties of the parameters and the correlation structure version for graphical representation. Finally, we discuss the implications of this generalization for other multidimensional item response theory models and the parallels of our results in common factor model theory.
{"title":"A Generalized Definition of Multidimensional Item Response Theory Parameters.","authors":"Daniel Morillo-Cuadrado, Mario Luzardo-Verde","doi":"10.1017/psy.2025.10063","DOIUrl":"10.1017/psy.2025.10063","url":null,"abstract":"<p><p>In this paper, we generalize the multidimensional discrimination and difficulty parameters in the multidimensional two-parameter logistic model to account for nonidentity latent covariances and negatively keyed items. We apply Reckase's maximum discrimination point method to define them in an arbitrary algebraic basis. Then, we define that basis to be a geometrical representation of the measured construct. This results in three different versions of the parameters: the original one, based on the item parameters solely; one that incorporates the covariance structure of the latent space; and one that uses the correlation structure instead. Importantly, we find that the items should be properly represented in a test space, distinct from the latent space. We also provide a procedure for the geometrical representation of the items in the test space and apply our results to examples from the literature to get a more accurate representation of the measurement properties of the items. We recommend using the covariance structure version for describing the properties of the parameters and the correlation structure version for graphical representation. Finally, we discuss the implications of this generalization for other multidimensional item response theory models and the parallels of our results in common factor model theory.</p>","PeriodicalId":54534,"journal":{"name":"Psychometrika","volume":" ","pages":"1-23"},"PeriodicalIF":3.1,"publicationDate":"2025-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145551540","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Andrea Brancaccio, Debora de Chiusole, Ottavia M Epifania, Pasquale Anselmi, Matilde Spinoso, Noemi Mazzoni, Alice Bacherini, Matteo Orsoni, Sara Giovagnoli, Irene Pierluigi, Mariagrazia Benassi, Giulia Balboni, Luca Stefanutti
Tower tasks are popular tools used to measure planning skills. The sequences of moves undertaken by the respondents in solving tower tasks might provide important and useful information to shed light on their planning skills. The article focuses on the distinction between a situation where planning occurs before action (pre-planning) from one where planning and action are interlaced all along the execution of the task (interim-planning). While the model for pre-planning was already developed by Stefanutti et al. (2021), an alternative model for the interim-planning is proposed. The two models are compared with one another in an empirical study. In accordance with the literature on the development of planning skills, the pre-planning model better fits data collected on individuals aged 14 on, while the interim-planning model displays a better fit with data collected on individuals aged 4-8. This result is further corroborated by the analysis of the time performance.
{"title":"Two Markov Solution Process Models for the Assessment of Planning in Problem Solving.","authors":"Andrea Brancaccio, Debora de Chiusole, Ottavia M Epifania, Pasquale Anselmi, Matilde Spinoso, Noemi Mazzoni, Alice Bacherini, Matteo Orsoni, Sara Giovagnoli, Irene Pierluigi, Mariagrazia Benassi, Giulia Balboni, Luca Stefanutti","doi":"10.1017/psy.2025.10042","DOIUrl":"https://doi.org/10.1017/psy.2025.10042","url":null,"abstract":"<p><p>Tower tasks are popular tools used to measure planning skills. The sequences of moves undertaken by the respondents in solving tower tasks might provide important and useful information to shed light on their planning skills. The article focuses on the distinction between a situation where planning occurs before action (pre-planning) from one where planning and action are interlaced all along the execution of the task (interim-planning). While the model for pre-planning was already developed by Stefanutti et al. (2021), an alternative model for the interim-planning is proposed. The two models are compared with one another in an empirical study. In accordance with the literature on the development of planning skills, the pre-planning model better fits data collected on individuals aged 14 on, while the interim-planning model displays a better fit with data collected on individuals aged 4-8. This result is further corroborated by the analysis of the time performance.</p>","PeriodicalId":54534,"journal":{"name":"Psychometrika","volume":" ","pages":"1-31"},"PeriodicalIF":3.1,"publicationDate":"2025-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145507979","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A ranking pattern approach is proposed to build item response theory (IRT) models for forced-choice (FC) items. This new approach is an addition to the two existing approaches, sequential selection and Thurstone's law of pairwise comparison. A new dominance IRT model, the multidimensional generalized partial preference model (MGPPM), is proposed for FC items with any number (greater than 1) of statements. The maximum marginal likelihood estimation using an expectation-maximization algorithm (MML-EM) and Markov chain Monte Carlo (MCMC) estimation are developed. A simulation study is conducted to show satisfactory parameter recovery on triplet and tetrad data. The relationships between the newly proposed approach/model and the existing approaches/models are described, and the MGPPM, Thurstonian IRT (TIRT) model, and Triplet-2PLM are compared when applied to simulated and real triplet data. The new approach offers more flexible IRT modeling than the other two approaches under different assumptions, and the MGPPM is more statistically elegant than the TIRT and Triple-2PLM.
{"title":"Multidimensional Generalized Partial Preference Model for Forced-Choice Items.","authors":"Daniel C Furr, Jianbin Fu","doi":"10.1017/psy.2025.10054","DOIUrl":"10.1017/psy.2025.10054","url":null,"abstract":"<p><p>A ranking pattern approach is proposed to build item response theory (IRT) models for forced-choice (FC) items. This new approach is an addition to the two existing approaches, sequential selection and Thurstone's law of pairwise comparison. A new dominance IRT model, the multidimensional generalized partial preference model (MGPPM), is proposed for FC items with any number (greater than 1) of statements. The maximum marginal likelihood estimation using an expectation-maximization algorithm (MML-EM) and Markov chain Monte Carlo (MCMC) estimation are developed. A simulation study is conducted to show satisfactory parameter recovery on triplet and tetrad data. The relationships between the newly proposed approach/model and the existing approaches/models are described, and the MGPPM, Thurstonian IRT (TIRT) model, and Triplet-2PLM are compared when applied to simulated and real triplet data. The new approach offers more flexible IRT modeling than the other two approaches under different assumptions, and the MGPPM is more statistically elegant than the TIRT and Triple-2PLM.</p>","PeriodicalId":54534,"journal":{"name":"Psychometrika","volume":" ","pages":"1-30"},"PeriodicalIF":3.1,"publicationDate":"2025-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12805200/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145508067","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Advances in deep learning and representation learning have transformed item factor analysis (IFA) in the item response theory (IRT) literature by enabling more efficient and accurate parameter estimation. Variational autoencoders (VAEs) are widely used to model high-dimensional latent variables in this context, but the limited expressiveness of their inference networks can still hinder performance. We introduce adversarial variational Bayes (AVB) and an importance-weighted extension (IWAVB) as more flexible inference algorithms for IFA. By combining VAEs with generative adversarial networks (GANs), AVB uses an auxiliary discriminator network to frame estimation as a two-player game and removes the restrictive standard normal assumption on the latent variables. Theoretically, AVB and IWAVB can achieve likelihoods that match or exceed those of VAEs and importance-weighted autoencoders (IWAEs). In exploratory analyses of empirical data, IWAVB attained higher likelihoods than IWAE, indicating greater expressiveness. In confirmatory simulations, IWAVB achieved comparable mean-square error in parameter recovery while consistently yielding higher likelihoods, and it clearly outperformed IWAE when the latent distribution was multimodal. These findings suggest that IWAVB can scale IFA to complex, large-scale, and potentially multimodal settings, supporting closer integration of psychometrics with modern multimodal data analysis.
{"title":"Generative Adversarial Networks for High-Dimensional Item Factor Analysis: A Deep Adversarial Learning Algorithm.","authors":"Nanyu Luo, Feng Ji","doi":"10.1017/psy.2025.10059","DOIUrl":"10.1017/psy.2025.10059","url":null,"abstract":"<p><p>Advances in deep learning and representation learning have transformed item factor analysis (IFA) in the item response theory (IRT) literature by enabling more efficient and accurate parameter estimation. Variational autoencoders (VAEs) are widely used to model high-dimensional latent variables in this context, but the limited expressiveness of their inference networks can still hinder performance. We introduce adversarial variational Bayes (AVB) and an importance-weighted extension (IWAVB) as more flexible inference algorithms for IFA. By combining VAEs with generative adversarial networks (GANs), AVB uses an auxiliary discriminator network to frame estimation as a two-player game and removes the restrictive standard normal assumption on the latent variables. Theoretically, AVB and IWAVB can achieve likelihoods that match or exceed those of VAEs and importance-weighted autoencoders (IWAEs). In exploratory analyses of empirical data, IWAVB attained higher likelihoods than IWAE, indicating greater expressiveness. In confirmatory simulations, IWAVB achieved comparable mean-square error in parameter recovery while consistently yielding higher likelihoods, and it clearly outperformed IWAE when the latent distribution was multimodal. These findings suggest that IWAVB can scale IFA to complex, large-scale, and potentially multimodal settings, supporting closer integration of psychometrics with modern multimodal data analysis.</p>","PeriodicalId":54534,"journal":{"name":"Psychometrika","volume":" ","pages":"1-24"},"PeriodicalIF":3.1,"publicationDate":"2025-11-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12805202/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145490878","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
M Marsman, L J Waldorp, N Sekulovski, J M B Haslbeck
{"title":"Bayes Factor Tests for Group Differences in Ordinal and Binary Graphical Models.","authors":"M Marsman, L J Waldorp, N Sekulovski, J M B Haslbeck","doi":"10.1017/psy.2025.10060","DOIUrl":"10.1017/psy.2025.10060","url":null,"abstract":"","PeriodicalId":54534,"journal":{"name":"Psychometrika","volume":" ","pages":"1-49"},"PeriodicalIF":3.1,"publicationDate":"2025-11-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12805206/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145439986","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Cultural consensus theory (CCT) leverages shared knowledge between individuals to optimally aggregate answers to questions for which the underlying truth is unknown. Existing CCT models have predominantly focused on unidimensional point truths using dichotomous, polytomous, or continuous response formats. However, certain domains, such as risk assessment or interpretation of verbal quantifiers, may require a consensus focused on intervals, capturing a range of relevant values. We introduce the interval consensus model (ICM), a novel extension of CCT designed to estimate consensus intervals from continuous bounded interval responses. We use a Bayesian hierarchical modeling approach to estimate latent consensus intervals. In a simulation study, we show that, under the conditions studied, the ICM performs better than using simple means and medians of the responses. We then apply the model to empirical judgments of verbal quantifiers.
{"title":"The Interval Consensus Model: Aggregating Continuous Bounded Interval Responses.","authors":"Matthias Kloft, Björn S Siepe, Daniel W Heck","doi":"10.1017/psy.2025.10058","DOIUrl":"10.1017/psy.2025.10058","url":null,"abstract":"<p><p>Cultural consensus theory (CCT) leverages shared knowledge between individuals to optimally aggregate answers to questions for which the underlying truth is unknown. Existing CCT models have predominantly focused on unidimensional point truths using dichotomous, polytomous, or continuous response formats. However, certain domains, such as risk assessment or interpretation of verbal quantifiers, may require a consensus focused on intervals, capturing a range of relevant values. We introduce the interval consensus model (ICM), a novel extension of CCT designed to estimate consensus intervals from continuous bounded interval responses. We use a Bayesian hierarchical modeling approach to estimate latent consensus intervals. In a simulation study, we show that, under the conditions studied, the ICM performs better than using simple means and medians of the responses. We then apply the model to empirical judgments of verbal quantifiers.</p>","PeriodicalId":54534,"journal":{"name":"Psychometrika","volume":" ","pages":"1-24"},"PeriodicalIF":3.1,"publicationDate":"2025-11-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145440013","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Wataru Urasaki, Tomoyuki Nakagawa, Jun Tsuchida, Kouji Tahata
When the row and column variables consist of the same category in a two-way contingency table, it is called a square contingency table. Since square contingency tables have an association structure due to the concentration of observed values near the main diagonal, a primary objective is to examine symmetric relationships and transitions between variables. Various models and measures have been proposed to analyze these structures to understand the changes between two variables' behavior at two-time points or cohorts. This is necessary for a detailed investigation of individual categories and their interrelationships, such as shifts in brand preferences. We propose a novel approach to correspondence analysis (CA) for evaluating departures from symmetry in square contingency tables with nominal categories, using a modified divergence statistic. This approach ensures that well-known divergence statistics can also be visualized and regardless of the divergence statistics used, the CA plot consists of two principal axes with equal contribution rates. Notably, the scaling of the departures from symmetry provided by the modified divergence statistic is independent of sample size, allowing for meaningful comparisons and unification of results across different tables. Confidence regions are also constructed to enhance the accuracy of the CA plot.
{"title":"Visualization for Departures from Symmetry with the Power-Divergence-Type Measure in Square Contingency Tables.","authors":"Wataru Urasaki, Tomoyuki Nakagawa, Jun Tsuchida, Kouji Tahata","doi":"10.1017/psy.2025.10057","DOIUrl":"10.1017/psy.2025.10057","url":null,"abstract":"<p><p>When the row and column variables consist of the same category in a two-way contingency table, it is called a square contingency table. Since square contingency tables have an association structure due to the concentration of observed values near the main diagonal, a primary objective is to examine symmetric relationships and transitions between variables. Various models and measures have been proposed to analyze these structures to understand the changes between two variables' behavior at two-time points or cohorts. This is necessary for a detailed investigation of individual categories and their interrelationships, such as shifts in brand preferences. We propose a novel approach to correspondence analysis (CA) for evaluating departures from symmetry in square contingency tables with nominal categories, using a modified divergence statistic. This approach ensures that well-known divergence statistics can also be visualized and regardless of the divergence statistics used, the CA plot consists of two principal axes with equal contribution rates. Notably, the scaling of the departures from symmetry provided by the modified divergence statistic is independent of sample size, allowing for meaningful comparisons and unification of results across different tables. Confidence regions are also constructed to enhance the accuracy of the CA plot.</p>","PeriodicalId":54534,"journal":{"name":"Psychometrika","volume":" ","pages":"1-20"},"PeriodicalIF":3.1,"publicationDate":"2025-11-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12805207/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145432909","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}