Julius M Pfadt, Dylan Molenaar, Petra Hurks, Klaas Sijtsma
{"title":"A Tutorial on Estimating the Precision of Individual Test Scores for Anyone Constructing and Using Psychological Tests.","authors":"Julius M Pfadt, Dylan Molenaar, Petra Hurks, Klaas Sijtsma","doi":"10.1017/psy.2026.10081","DOIUrl":"https://doi.org/10.1017/psy.2026.10081","url":null,"abstract":"","PeriodicalId":54534,"journal":{"name":"Psychometrika","volume":" ","pages":"1-35"},"PeriodicalIF":3.1,"publicationDate":"2026-01-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145936369","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In latent space item response models (LSIRMs), subjects and items are embedded in a low-dimensional Euclidean latent space. As such, interactions among persons and/or items can be revealed that are unmodeled in conventional item response theory models. Current estimation approach for LSIRMs is a fully Bayesian procedure with Markov Chain Monte Carlo, which is, while practical, computationally challenging, hampering applied researchers to use the models in a wide range of settings. Therefore, we propose an LSIRM based on two variants of regularized joint maximum likelihood (JML) estimation: penalized JML and constrained JML. Owing to the absence of integrals in the likelihood, the JML methods allow for various models to be fit in limited amount of time. This computational speed facilitates a practical extension of LSIRMs to ordinal data, and the possibility to select the dimensionality of the latent space using cross-validation. In this study, we derive the two JML approaches and address different issues that arise when using maximum likelihood to estimate the LSIRM. We present a simulation study demonstrating acceptable parameter recovery and adequate performance of the cross-validation procedure. In addition, we estimate different binary and ordinal LSIRMs on real datasets pertaining to deductive reasoning and personality. All methods are implemented in R package 'LSMjml' which is available from CRAN.
{"title":"Regularized Joint Maximum Likelihood Estimation of Latent Space Item Response Models.","authors":"Dylan Molenaar, Minjeong Jeon","doi":"10.1017/psy.2025.10068","DOIUrl":"https://doi.org/10.1017/psy.2025.10068","url":null,"abstract":"<p><p>In latent space item response models (LSIRMs), subjects and items are embedded in a low-dimensional Euclidean latent space. As such, interactions among persons and/or items can be revealed that are unmodeled in conventional item response theory models. Current estimation approach for LSIRMs is a fully Bayesian procedure with Markov Chain Monte Carlo, which is, while practical, computationally challenging, hampering applied researchers to use the models in a wide range of settings. Therefore, we propose an LSIRM based on two variants of regularized joint maximum likelihood (JML) estimation: penalized JML and constrained JML. Owing to the absence of integrals in the likelihood, the JML methods allow for various models to be fit in limited amount of time. This computational speed facilitates a practical extension of LSIRMs to ordinal data, and the possibility to select the dimensionality of the latent space using cross-validation. In this study, we derive the two JML approaches and address different issues that arise when using maximum likelihood to estimate the LSIRM. We present a simulation study demonstrating acceptable parameter recovery and adequate performance of the cross-validation procedure. In addition, we estimate different binary and ordinal LSIRMs on real datasets pertaining to deductive reasoning and personality. All methods are implemented in R package 'LSMjml' which is available from CRAN.</p>","PeriodicalId":54534,"journal":{"name":"Psychometrika","volume":" ","pages":"1-25"},"PeriodicalIF":3.1,"publicationDate":"2026-01-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145936442","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The article proposes a new approach to estimating the latent distribution of item response theory (IRT) using kernel density estimation (KDE), particularly the solve-the-equation (STE) algorithm developed by Sheather and Jones (1991). As with existing methods, the KDE method aims to estimate the latent distribution of IRT to reduce biases in parameter estimates when the normality assumption on the latent variable is violated. Simulation studies and an empirical example confirm the robustness of algorithmic convergence of the KDE approach, and show that the KDE approach yields parameter estimates that are more accurate than or comparable to existing methods. Unlike other approaches that require multiple model fits for smoothing parameter selection, KDE requires only a single model-fitting step, substantially reducing computation time. These findings highlight KDE as a practical and efficient method for estimating latent distributions in IRT.
{"title":"Estimating Latent Distribution of Item Response Theory Using Kernel Density Method.","authors":"Seewoo Li, Guemin Lee","doi":"10.1017/psy.2026.10080","DOIUrl":"10.1017/psy.2026.10080","url":null,"abstract":"<p><p>The article proposes a new approach to estimating the latent distribution of item response theory (IRT) using kernel density estimation (KDE), particularly the solve-the-equation (STE) algorithm developed by Sheather and Jones (1991). As with existing methods, the KDE method aims to estimate the latent distribution of IRT to reduce biases in parameter estimates when the normality assumption on the latent variable is violated. Simulation studies and an empirical example confirm the robustness of algorithmic convergence of the KDE approach, and show that the KDE approach yields parameter estimates that are more accurate than or comparable to existing methods. Unlike other approaches that require multiple model fits for smoothing parameter selection, KDE requires only a single model-fitting step, substantially reducing computation time. These findings highlight KDE as a practical and efficient method for estimating latent distributions in IRT.</p>","PeriodicalId":54534,"journal":{"name":"Psychometrika","volume":" ","pages":"1-21"},"PeriodicalIF":3.1,"publicationDate":"2026-01-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145919241","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This work establishes a new identifiability theory for a cornerstone of various cognitive diagnostic models (CDMs) popular in psychometrics: the Q-matrix. The key idea is a novel tensor-unfolding proof strategy. Representing the joint distribution of J categorical responses as a J-way tensor, we strategically unfold the tensor into matrices in multiple ways and use their rank properties to identify the unknown Q-matrix. This approach departs fundamentally from all prior identifiability analyses in CDMs. Our proof is constructive, elucidating a population-level procedure to exactly recover the Q-matrix within a parameter space where each latent attribute is measured by at least two "pure" items that solely measure this attribute. The theory has several desirable features: it can constructively identify both the Q-matrix and the number of latent attributes; it applies to broad classes of linear and nonlinear CDMs with main or all saturated effects of attributes; and it accommodates polytomous responses, extending beyond classical binary response settings. The new identifiability result unifies and strengthens identifiability guarantees across diverse CDMs. It provides rigorous theoretical foundations and indicates a future pathway toward using tensor unfolding for practical Q-matrix estimation.
{"title":"Constructive <i>Q</i>-Matrix Identifiability via Novel Tensor Unfolding.","authors":"Yuqi Gu","doi":"10.1017/psy.2025.10078","DOIUrl":"https://doi.org/10.1017/psy.2025.10078","url":null,"abstract":"<p><p>This work establishes a new identifiability theory for a cornerstone of various cognitive diagnostic models (CDMs) popular in psychometrics: the Q-matrix. The key idea is a novel tensor-unfolding proof strategy. Representing the joint distribution of <i>J</i> categorical responses as a <i>J</i>-way tensor, we strategically unfold the tensor into matrices in multiple ways and use their rank properties to identify the unknown Q-matrix. This approach departs fundamentally from all prior identifiability analyses in CDMs. Our proof is constructive, elucidating a population-level procedure to exactly recover the Q-matrix within a parameter space where each latent attribute is measured by at least two \"pure\" items that solely measure this attribute. The theory has several desirable features: it can constructively identify both the Q-matrix and the number of latent attributes; it applies to broad classes of linear and nonlinear CDMs with main or all saturated effects of attributes; and it accommodates polytomous responses, extending beyond classical binary response settings. The new identifiability result unifies and strengthens identifiability guarantees across diverse CDMs. It provides rigorous theoretical foundations and indicates a future pathway toward using tensor unfolding for practical Q-matrix estimation.</p>","PeriodicalId":54534,"journal":{"name":"Psychometrika","volume":" ","pages":"1-20"},"PeriodicalIF":3.1,"publicationDate":"2026-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145907280","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Plausible and Proper Multiple-Choice Items for Diagnostic Classification.","authors":"Chia-Yi Chiu, Hans Friedrich Koehn, Yu Wang","doi":"10.1017/psy.2025.10074","DOIUrl":"https://doi.org/10.1017/psy.2025.10074","url":null,"abstract":"","PeriodicalId":54534,"journal":{"name":"Psychometrika","volume":" ","pages":"1-43"},"PeriodicalIF":3.1,"publicationDate":"2025-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145783651","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Multiple response (MR) items-such as multiple true-false, multiple-select, and select-N items-are increasingly used in assessments to identify partial knowledge and differentiate latent abilities more accurately. Allowing multiple selections, MR items provide richer information and reduce guessing effects compared to single-answer multiple-choice items. However, traditional scoring methods (e.g., Dichotomous, Ripkey, Partial scoring) compress response combination (RC) data, losing valuable information and ignoring issues like local dependence and incompatibility across item types. To address these challenges, we introduce a novel psychometric model framework: the Multiple Response Model with Inter-option Local Dependencies (MRM-LD), and its simplified version, the Multiple Response Model (MRM). These models preserve RC data across MR item types, offering a more comprehensive understanding for MR assessment. Parameters for MRM-LD and MRM were estimated using Markov chain Monte Carlo algorithms in Stan and R. Empirical data from an eighth-grade physics test showed that MRM-LD and MRM outperform Graded Response Model and Nominal Response Model combined with three scoring methods, by retaining more test information, improving reliability and validity, and providing more detailed analysis of item characteristics. Simulation studies confirmed the proposed models perform robustly under various conditions, including small samples and few items, demonstrating their applicability across diverse testing scenarios.
{"title":"Psychometric Model Framework for Multiple Response Items.","authors":"Wenjie Zhou, Lei Guo","doi":"10.1017/psy.2025.10073","DOIUrl":"10.1017/psy.2025.10073","url":null,"abstract":"<p><p>Multiple response (MR) items-such as multiple true-false, multiple-select, and select-N items-are increasingly used in assessments to identify partial knowledge and differentiate latent abilities more accurately. Allowing multiple selections, MR items provide richer information and reduce guessing effects compared to single-answer multiple-choice items. However, traditional scoring methods (e.g., Dichotomous, Ripkey, Partial scoring) compress response combination (RC) data, losing valuable information and ignoring issues like local dependence and incompatibility across item types. To address these challenges, we introduce a novel psychometric model framework: the Multiple Response Model with Inter-option Local Dependencies (MRM-LD), and its simplified version, the Multiple Response Model (MRM). These models preserve RC data across MR item types, offering a more comprehensive understanding for MR assessment. Parameters for MRM-LD and MRM were estimated using Markov chain Monte Carlo algorithms in Stan and R. Empirical data from an eighth-grade physics test showed that MRM-LD and MRM outperform Graded Response Model and Nominal Response Model combined with three scoring methods, by retaining more test information, improving reliability and validity, and providing more detailed analysis of item characteristics. Simulation studies confirmed the proposed models perform robustly under various conditions, including small samples and few items, demonstrating their applicability across diverse testing scenarios.</p>","PeriodicalId":54534,"journal":{"name":"Psychometrika","volume":" ","pages":"1-33"},"PeriodicalIF":3.1,"publicationDate":"2025-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145783707","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Test fairness is a major concern in psychometric and educational research. A typical approach for ensuring test fairness is through differential item functioning (DIF) analysis. DIF arises when a test item functions differently across subgroups that are typically defined by the respondents' demographic characteristics. Most of the existing research focuses on the statistical detection of DIF, yet less attention has been given to reducing or eliminating DIF. Simultaneously, the use of computer-based assessments has become increasingly popular. The data obtained from respondents interacting with an item are recorded in computer log files and are referred to as process data. In this article, we propose a novel method within the framework of generalized linear models that leverages process data to reduce and understand DIF. Specifically, we construct a nuisance trait surrogate with the features extracted from process data. With the constructed nuisance trait, we introduce a new scoring rule that incorporates respondents' behaviors captured through process data on top of the target latent trait. We demonstrate the efficiency of our approach through extensive simulation experiments and an application to 13 Problem Solving in Technology-Rich Environments items from the 2012 Programme for the International Assessment of Adult Competencies assessment.
{"title":"Reducing Differential Item Functioning via Process Data.","authors":"Ling Chen, Susu Zhang, Jingchen Liu","doi":"10.1017/psy.2025.10072","DOIUrl":"10.1017/psy.2025.10072","url":null,"abstract":"<p><p>Test fairness is a major concern in psychometric and educational research. A typical approach for ensuring test fairness is through differential item functioning (DIF) analysis. DIF arises when a test item functions differently across subgroups that are typically defined by the respondents' demographic characteristics. Most of the existing research focuses on the statistical detection of DIF, yet less attention has been given to reducing or eliminating DIF. Simultaneously, the use of computer-based assessments has become increasingly popular. The data obtained from respondents interacting with an item are recorded in computer log files and are referred to as process data. In this article, we propose a novel method within the framework of generalized linear models that leverages process data to reduce and understand DIF. Specifically, we construct a nuisance trait surrogate with the features extracted from process data. With the constructed nuisance trait, we introduce a new scoring rule that incorporates respondents' behaviors captured through process data on top of the target latent trait. We demonstrate the efficiency of our approach through extensive simulation experiments and an application to 13 Problem Solving in Technology-Rich Environments items from the 2012 Programme for the International Assessment of Adult Competencies assessment.</p>","PeriodicalId":54534,"journal":{"name":"Psychometrika","volume":" ","pages":"1-36"},"PeriodicalIF":3.1,"publicationDate":"2025-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145716679","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Michael Fauss, Xiang Liu, Chen Li, Ikkyu Choi, H Vincent Poor
This article investigates the problem of automatically flagging test takers who exhibit atypical responses or behaviors for further review by human experts. The objective is to develop a selection policy that maximizes the expected number of test takers correctly identified as warranting additional scrutiny while maintaining a manageable volume of reviews per test administration. The selection procedure should learn from the outcomes of the expert reviews. Since typically only a fraction of test takers are reviewed, this leads to a semi-supervised learning problem. The latter is formalized in a Bayesian setting, and the corresponding optimal selection policy is derived. Since calculating the policy and the underlying posterior distributions is computationally infeasible, a variational approximation and three heuristic selection policies are proposed. These policies are informed by properties of the optimal policy and correspond to different exploration/exploitation trade-offs. The performance of the approximate policies is assessed via numerical experiments using both synthetic and real-world data and is compared with procedures based on off-the-shelf algorithms as well as theoretical performance bounds.
{"title":"Bayesian Selection Policies for Human-in-the-Loop Anomaly Detectors with Applications in Test Security.","authors":"Michael Fauss, Xiang Liu, Chen Li, Ikkyu Choi, H Vincent Poor","doi":"10.1017/psy.2025.10056","DOIUrl":"10.1017/psy.2025.10056","url":null,"abstract":"<p><p>This article investigates the problem of automatically flagging test takers who exhibit atypical responses or behaviors for further review by human experts. The objective is to develop a selection policy that maximizes the expected number of test takers correctly identified as warranting additional scrutiny while maintaining a manageable volume of reviews per test administration. The selection procedure should learn from the outcomes of the expert reviews. Since typically only a fraction of test takers are reviewed, this leads to a semi-supervised learning problem. The latter is formalized in a Bayesian setting, and the corresponding optimal selection policy is derived. Since calculating the policy and the underlying posterior distributions is computationally infeasible, a variational approximation and three heuristic selection policies are proposed. These policies are informed by properties of the optimal policy and correspond to different exploration/exploitation trade-offs. The performance of the approximate policies is assessed via numerical experiments using both synthetic and real-world data and is compared with procedures based on off-the-shelf algorithms as well as theoretical performance bounds.</p>","PeriodicalId":54534,"journal":{"name":"Psychometrika","volume":" ","pages":"1-33"},"PeriodicalIF":3.1,"publicationDate":"2025-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145716767","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"SELF-Tree: An Interpretable Model for Multivariate Causal Direction Heterogeneity Analysis.","authors":"Zhifei Li, Hongbo Wen","doi":"10.1017/psy.2025.10067","DOIUrl":"https://doi.org/10.1017/psy.2025.10067","url":null,"abstract":"","PeriodicalId":54534,"journal":{"name":"Psychometrika","volume":" ","pages":"1-52"},"PeriodicalIF":3.1,"publicationDate":"2025-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145716708","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}