Pub Date : 2025-09-01Epub Date: 2025-08-26DOI: 10.1017/psy.2025.10029
Hyeon-Ah Kang
The latent Markov model (LMM) has been increasingly used to analyze log data from computer-interactive assessments. An important consideration in applying the LMM to assessment data is measurement effects of items. In educational and psychological assessment, items exhibit distinct psychometric qualities and induce systematic variance to assessment outcome data. The current development in LMM, however, assumes that items have uniform effects and do not contribute to the variance of measurement outcomes. In this study, we propose a refinement of LMM that relaxes the measurement invariance constraint and examine empirical performance of the new framework through numerical experimentation. We modify the LMM for noninvariant measurements and refine the inferential scheme to accommodate the event-specific measurement effects. Numerical experiments are conducted to validate the proposed inference methods and evaluate the performance of the new framework. Results suggest that the proposed inferential scheme performs adequately well in retrieving the model parameters and state profiles. The new LMM framework demonstrated reliable and stable performance in modeling latent processes while appropriately accounting for items' measurement effects. Compared with the traditional scheme, the refined framework demonstrated greater relevance to real assessment data and yielded more robust inference results when the model was ill-specified. The findings from the empirical evaluations suggest that the new framework has potential for serving large-scale assessment data that exhibit distinct measurement effects.
{"title":"A Latent Markov Model for Noninvariant Measurements: An Application to Interaction Log Data From Computer-Interactive Assessments.","authors":"Hyeon-Ah Kang","doi":"10.1017/psy.2025.10029","DOIUrl":"10.1017/psy.2025.10029","url":null,"abstract":"<p><p>The latent Markov model (LMM) has been increasingly used to analyze log data from computer-interactive assessments. An important consideration in applying the LMM to assessment data is measurement effects of items. In educational and psychological assessment, items exhibit distinct psychometric qualities and induce systematic variance to assessment outcome data. The current development in LMM, however, assumes that items have uniform effects and do not contribute to the variance of measurement outcomes. In this study, we propose a refinement of LMM that relaxes the measurement invariance constraint and examine empirical performance of the new framework through numerical experimentation. We modify the LMM for noninvariant measurements and refine the inferential scheme to accommodate the event-specific measurement effects. Numerical experiments are conducted to validate the proposed inference methods and evaluate the performance of the new framework. Results suggest that the proposed inferential scheme performs adequately well in retrieving the model parameters and state profiles. The new LMM framework demonstrated reliable and stable performance in modeling latent processes while appropriately accounting for items' measurement effects. Compared with the traditional scheme, the refined framework demonstrated greater relevance to real assessment data and yielded more robust inference results when the model was ill-specified. The findings from the empirical evaluations suggest that the new framework has potential for serving large-scale assessment data that exhibit distinct measurement effects.</p>","PeriodicalId":54534,"journal":{"name":"Psychometrika","volume":" ","pages":"1481-1505"},"PeriodicalIF":3.1,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12660023/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144978378","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-09-01Epub Date: 2025-08-08DOI: 10.1017/psy.2025.10031
Joseph Resch, Samuel Baugh, Hao Duan, James Tang, Matthew J Madison, Michael Cotterell, Minjeong Jeon
Diagnostic classification models assume the existence of latent attribute profiles, the possession of which increases the probability of responding correctly to questions requiring the corresponding attributes. Through the use of longitudinally administered exams, the degree to which students are acquiring core attributes over time can be assessed. While past approaches to longitudinal diagnostic classification modeling perform inference on the overall probability of acquiring particular attributes, there is particular interest in the relationship between student progression and student covariates such as intervention effects. To address this need, we propose an integrated Bayesian model for student progression in a longitudinal diagnostic classification modeling framework. Using Pòlya-gamma augmentation with two logistic link functions, we achieve computationally efficient posterior estimation with a conditionally Gibbs sampling procedure. We show that this approach achieves accurate parameter recovery when evaluated using simulated data. We also demonstrate the method on a real-world educational testing data set.
{"title":"Bayesian Transition Diagnostic Classification Models with Polya-Gamma Augmentation.","authors":"Joseph Resch, Samuel Baugh, Hao Duan, James Tang, Matthew J Madison, Michael Cotterell, Minjeong Jeon","doi":"10.1017/psy.2025.10031","DOIUrl":"10.1017/psy.2025.10031","url":null,"abstract":"<p><p>Diagnostic classification models assume the existence of latent attribute profiles, the possession of which increases the probability of responding correctly to questions requiring the corresponding attributes. Through the use of longitudinally administered exams, the degree to which students are acquiring core attributes over time can be assessed. While past approaches to longitudinal diagnostic classification modeling perform inference on the overall probability of acquiring particular attributes, there is particular interest in the relationship between student progression and student covariates such as intervention effects. To address this need, we propose an integrated Bayesian model for student progression in a longitudinal diagnostic classification modeling framework. Using Pòlya-gamma augmentation with two logistic link functions, we achieve computationally efficient posterior estimation with a conditionally Gibbs sampling procedure. We show that this approach achieves accurate parameter recovery when evaluated using simulated data. We also demonstrate the method on a real-world educational testing data set.</p>","PeriodicalId":54534,"journal":{"name":"Psychometrika","volume":" ","pages":"1368-1399"},"PeriodicalIF":3.1,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12660026/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144800958","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-09-01Epub Date: 2025-06-16DOI: 10.1017/psy.2025.10018
Daniel Bengs, Ulf Brefeld, Ulf Kroehne, Fabian Zehner
Test items using open-ended response formats can increase an instrument's construct validity. However, traditionally, their application in educational testing requires human coders to score the responses. Manual scoring not only increases operational costs but also prohibits the use of evidence from open-ended items to inform routing decisions in adaptive designs. Using machine learning and natural language processing, automatic scoring provides classifiers that can instantly assign scores to text responses. Although optimized for agreement with manual scores, automatic scoring is not perfectly accurate and introduces an additional source of error into the response process, leading to a misspecification of the measurement model used with the manual score. We propose two joint models for manual and automatic scores of automatically scored open-ended items. Our models extend a given model from Item Response Theory for the manual scores by a component for the automatic scores, accounting for classification errors. The models were evaluated using data from the Programme for International Student Assessment (2012) and simulated data, demonstrating their capacity to mitigate the impact of classification errors on ability estimation compared to a baseline that disregards classification errors.
{"title":"Joint Item Response Models for Manual and Automatic Scores on Open-Ended Test Items.","authors":"Daniel Bengs, Ulf Brefeld, Ulf Kroehne, Fabian Zehner","doi":"10.1017/psy.2025.10018","DOIUrl":"10.1017/psy.2025.10018","url":null,"abstract":"<p><p>Test items using open-ended response formats can increase an instrument's construct validity. However, traditionally, their application in educational testing requires human coders to score the responses. Manual scoring not only increases operational costs but also prohibits the use of evidence from open-ended items to inform routing decisions in adaptive designs. Using machine learning and natural language processing, automatic scoring provides classifiers that can instantly assign scores to text responses. Although optimized for agreement with manual scores, automatic scoring is not perfectly accurate and introduces an additional source of error into the response process, leading to a misspecification of the measurement model used with the manual score. We propose two joint models for manual and automatic scores of automatically scored open-ended items. Our models extend a given model from Item Response Theory for the manual scores by a component for the automatic scores, accounting for classification errors. The models were evaluated using data from the Programme for International Student Assessment (2012) and simulated data, demonstrating their capacity to mitigate the impact of classification errors on ability estimation compared to a baseline that disregards classification errors.</p>","PeriodicalId":54534,"journal":{"name":"Psychometrika","volume":" ","pages":"1346-1367"},"PeriodicalIF":3.1,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12660020/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144303615","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-09-01Epub Date: 2025-08-08DOI: 10.1017/psy.2025.10033
Youjin Lee, Youmi Suk
Many observational studies often involve multiple levels of treatment assignment. In particular, fuzzy regression discontinuity (RD) designs have sequential treatment assignment processes: first based on eligibility criteria, and second, on (non-)compliance rules. In such fuzzy RD designs, researchers typically use either an intent-to-treat approach or an instrumental variable-type approach, and each is subject to both overlapping and unique biases. This article proposes a new evidence factors (EFs) framework for fuzzy RD designs with sequential treatment assignments, which may be influenced by different levels of decision-makers. Each of the proposed EFs aims to test the same causal null hypothesis while potentially being subject to different types of biases. Our proposed framework utilizes the local RD randomization and randomization-based inference. We evaluate the effectiveness of our proposed framework through simulation studies and two real datasets on pre-kindergarten programs and testing accommodations.
{"title":"Evidence Factors in Fuzzy Regression Discontinuity Designs with Sequential Treatment Assignments.","authors":"Youjin Lee, Youmi Suk","doi":"10.1017/psy.2025.10033","DOIUrl":"10.1017/psy.2025.10033","url":null,"abstract":"<p><p>Many observational studies often involve multiple levels of treatment assignment. In particular, fuzzy regression discontinuity (RD) designs have sequential treatment assignment processes: first based on eligibility criteria, and second, on (non-)compliance rules. In such fuzzy RD designs, researchers typically use either an intent-to-treat approach or an instrumental variable-type approach, and each is subject to both overlapping and unique biases. This article proposes a new evidence factors (EFs) framework for fuzzy RD designs with sequential treatment assignments, which may be influenced by different levels of decision-makers. Each of the proposed EFs aims to test the same causal null hypothesis while potentially being subject to different types of biases. Our proposed framework utilizes the local RD randomization and randomization-based inference. We evaluate the effectiveness of our proposed framework through simulation studies and two real datasets on pre-kindergarten programs and testing accommodations.</p>","PeriodicalId":54534,"journal":{"name":"Psychometrika","volume":" ","pages":"1400-1418"},"PeriodicalIF":3.1,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12660022/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144800959","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-09-01Epub Date: 2025-08-08DOI: 10.1017/psy.2025.10027
Chuchu Wang, Rongqian Sun, Xiangnan Feng, Xinyuan Song
The envelope model has gained significant attention since its proposal, offering a fresh perspective on dimension reduction in multivariate regression models and improving estimation efficiency. One of its appealing features is its adaptability to diverse regression contexts. This article introduces the integration of envelope methods into the factor analysis model. In contrast to previous research primarily focused on the frequentist approach, the study proposes a Bayesian approach for estimation and envelope dimension selection. A Metropolis-within-Gibbs sampling algorithm is developed to draw posterior samples for Bayesian inference. A simulation study is conducted to illustrate the effectiveness of the proposed method. Additionally, the proposed methodology is applied to the ADNI dataset to explore the relationship between cognitive decline and the changes occurring in various brain regions. This empirical application further highlights the practical utility of the proposed model in real-world scenarios.
{"title":"Bayesian Structural Equation Envelope Model.","authors":"Chuchu Wang, Rongqian Sun, Xiangnan Feng, Xinyuan Song","doi":"10.1017/psy.2025.10027","DOIUrl":"10.1017/psy.2025.10027","url":null,"abstract":"<p><p>The envelope model has gained significant attention since its proposal, offering a fresh perspective on dimension reduction in multivariate regression models and improving estimation efficiency. One of its appealing features is its adaptability to diverse regression contexts. This article introduces the integration of envelope methods into the factor analysis model. In contrast to previous research primarily focused on the frequentist approach, the study proposes a Bayesian approach for estimation and envelope dimension selection. A Metropolis-within-Gibbs sampling algorithm is developed to draw posterior samples for Bayesian inference. A simulation study is conducted to illustrate the effectiveness of the proposed method. Additionally, the proposed methodology is applied to the ADNI dataset to explore the relationship between cognitive decline and the changes occurring in various brain regions. This empirical application further highlights the practical utility of the proposed model in real-world scenarios.</p>","PeriodicalId":54534,"journal":{"name":"Psychometrika","volume":" ","pages":"1236-1257"},"PeriodicalIF":3.1,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12660024/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144800957","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-09-01Epub Date: 2025-09-23DOI: 10.1017/psy.2025.10041
Lijin Zhang, Benjamin W Domingue, Leonie V D E Vogelsmeier, Esther Ulitzsch
Visual Analogue scales (VASs) are increasingly popular in psychological, social, and medical research. However, VASs can also be more demanding for respondents, potentially leading to quicker disengagement and a higher risk of careless responding. Existing mixture modeling approaches for careless response detection have so far only been available for Likert-type and unbounded continuous data but have not been tailored to VAS data. This study introduces and evaluates a model-based approach specifically designed to detect and account for careless respondents in VAS data. We integrate existing measurement models for VASs with mixture item response theory models for identifying and modeling careless responding. Simulation results show that the proposed model effectively detects careless responding and recovers key parameters. We illustrate the model's potential for identifying and accounting for careless responding using real data from both VASs and Likert scales. First, we show how the model can be used to compare careless responding across different scale types, revealing a higher proportion of careless respondents in VAS compared to Likert scale data. Second, we demonstrate that item parameters from the proposed model exhibit improved psychometric properties compared to those from a model that ignores careless responding. These findings underscore the model's potential to enhance data quality by identifying and addressing careless responding.
{"title":"A Beta Mixture Model for Careless Respondent Detection in Visual Analogue Scale Data.","authors":"Lijin Zhang, Benjamin W Domingue, Leonie V D E Vogelsmeier, Esther Ulitzsch","doi":"10.1017/psy.2025.10041","DOIUrl":"10.1017/psy.2025.10041","url":null,"abstract":"<p><p>Visual Analogue scales (VASs) are increasingly popular in psychological, social, and medical research. However, VASs can also be more demanding for respondents, potentially leading to quicker disengagement and a higher risk of careless responding. Existing mixture modeling approaches for careless response detection have so far only been available for Likert-type and unbounded continuous data but have not been tailored to VAS data. This study introduces and evaluates a model-based approach specifically designed to detect and account for careless respondents in VAS data. We integrate existing measurement models for VASs with mixture item response theory models for identifying and modeling careless responding. Simulation results show that the proposed model effectively detects careless responding and recovers key parameters. We illustrate the model's potential for identifying and accounting for careless responding using real data from both VASs and Likert scales. First, we show how the model can be used to compare careless responding across different scale types, revealing a higher proportion of careless respondents in VAS compared to Likert scale data. Second, we demonstrate that item parameters from the proposed model exhibit improved psychometric properties compared to those from a model that ignores careless responding. These findings underscore the model's potential to enhance data quality by identifying and addressing careless responding.</p>","PeriodicalId":54534,"journal":{"name":"Psychometrika","volume":" ","pages":"1558-1581"},"PeriodicalIF":3.1,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12672952/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145126224","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-09-01Epub Date: 2025-08-08DOI: 10.1017/psy.2025.10036
Peida Zhan, Zhimou Wang, Gaohong Chu, Haixin Qiao
Teamwork relies on collaboration to achieve goals that exceed individual capabilities, with team cognition playing a key role by integrating individual expertise and shared understanding. Identifying the causes of inefficiencies or poor team performance is critical for implementing targeted interventions and fostering the development of team cognition. This study proposes a teamwork cognitive diagnostic modeling framework comprising 12 specific models-collectively referred to as Team-CDMs-which are designed to capture the interdependence among team members through emergent team cognitions by jointly modeling individual cognitive attributes and a team-level construct, termed teamwork quality, which reflects the social dimension of collaboration. The models can be used to identify strengths and weaknesses in team cognition and determine whether poor performance arises from cognitive deficiencies or social issues. Two simulation studies were conducted to assess the psychometric properties of the models under diverse conditions, followed by a teamwork reasoning task to demonstrate their application. The results showed that Team-CDMs achieve robust parameter estimation, effectively diagnose individual attributes, and assess teamwork quality while pinpointing the causes of poor performance. These findings underscore the utility of Team-CDMs in understanding, diagnosing, and improving team cognition, offering a foundation for future research and practical applications in teamwork-based assessments.
{"title":"Teamwork Cognitive Diagnostic Modeling.","authors":"Peida Zhan, Zhimou Wang, Gaohong Chu, Haixin Qiao","doi":"10.1017/psy.2025.10036","DOIUrl":"10.1017/psy.2025.10036","url":null,"abstract":"<p><p>Teamwork relies on collaboration to achieve goals that exceed individual capabilities, with team cognition playing a key role by integrating individual expertise and shared understanding. Identifying the causes of inefficiencies or poor team performance is critical for implementing targeted interventions and fostering the development of team cognition. This study proposes a teamwork cognitive diagnostic modeling framework comprising 12 specific models-collectively referred to as Team-CDMs-which are designed to capture the interdependence among team members through emergent team cognitions by jointly modeling individual cognitive attributes and a team-level construct, termed <i>teamwork quality</i>, which reflects the social dimension of collaboration. The models can be used to identify strengths and weaknesses in team cognition and determine whether poor performance arises from cognitive deficiencies or social issues. Two simulation studies were conducted to assess the psychometric properties of the models under diverse conditions, followed by a teamwork reasoning task to demonstrate their application. The results showed that Team-CDMs achieve robust parameter estimation, effectively diagnose individual attributes, and assess teamwork quality while pinpointing the causes of poor performance. These findings underscore the utility of Team-CDMs in understanding, diagnosing, and improving team cognition, offering a foundation for future research and practical applications in teamwork-based assessments.</p>","PeriodicalId":54534,"journal":{"name":"Psychometrika","volume":" ","pages":"1319-1345"},"PeriodicalIF":3.1,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12659997/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144800960","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
With increasingly available computer-based or online assessments, researchers have shown keen interest in analyzing log data to improve our understanding of test takers' problem-solving processes. In this article, we propose a multi-state survival model (MSM) to action sequence data from log files, focusing on modeling test takers' reaction times between actions, in order to investigate which factors and how they influence test takers' transition speed between actions. We specifically identify the key actions that differentiate correct and incorrect answers, compare transition probabilities between these groups, and analyze their distinct problem-solving patterns. Through simulation studies and sensitivity analyses, we evaluate the robustness of our proposed model. We demonstrate the proposed approach using problem-solving items from the Programme for the International Assessment of Adult Competencies (PIAAC).
{"title":"Analysis of Log Data From an International Online Educational Assessment System: A Multi-State Survival Modeling Approach to Reaction Time Between and Across Action Sequence.","authors":"Jina Park, Ick Hoon Jin, Minjeong Jeon","doi":"10.1017/psy.2025.10043","DOIUrl":"10.1017/psy.2025.10043","url":null,"abstract":"<p><p>With increasingly available computer-based or online assessments, researchers have shown keen interest in analyzing log data to improve our understanding of test takers' problem-solving processes. In this article, we propose a multi-state survival model (MSM) to action sequence data from log files, focusing on modeling test takers' reaction times between actions, in order to investigate which factors and how they influence test takers' transition speed between actions. We specifically identify the key actions that differentiate correct and incorrect answers, compare transition probabilities between these groups, and analyze their distinct problem-solving patterns. Through simulation studies and sensitivity analyses, we evaluate the robustness of our proposed model. We demonstrate the proposed approach using problem-solving items from the Programme for the International Assessment of Adult Competencies (PIAAC).</p>","PeriodicalId":54534,"journal":{"name":"Psychometrika","volume":" ","pages":"1506-1535"},"PeriodicalIF":3.1,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12660000/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144978850","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-09-01Epub Date: 2025-08-11DOI: 10.1017/psy.2025.10035
Giuseppe Mignemi, Ioanna Manolopoulou
Rating procedure is crucial in many applied fields (e.g., educational, clinical, emergency). In these contexts, a rater (e.g., teacher, doctor) scores a subject (e.g., student, doctor) on a rating scale. Given raters' variability, several statistical methods have been proposed for assessing and improving the quality of ratings. The analysis and the estimate of the Intraclass Correlation Coefficient (ICC) are major concerns in such cases. As evidenced by the literature, ICC might differ across different subgroups of raters and might be affected by contextual factors and subject heterogeneity. Model estimation in the presence of heterogeneity has been one of the recent challenges in this research line. Consequently, several methods have been proposed to address this issue under a parametric multilevel modelling framework, in which strong distributional assumptions are made. We propose a more flexible model under the Bayesian nonparametric (BNP) framework, in which most of those assumptions are relaxed. By eliciting hierarchical discrete nonparametric priors, the model accommodates clusters among raters and subjects, naturally accounts for heterogeneity, and improves estimates' accuracy. We propose a general BNP heteroscedastic framework to analyze continuous and coarse rating data and possible latent differences among subjects and raters. The estimated densities are used to make inferences about the rating process and the quality of the ratings. By exploiting a stick-breaking representation of the discrete nonparametric priors, a general class of ICC indices might be derived for these models. Our method allows us to independently identify latent similarities between subjects and raters and can be applied in precise education to improve personalized teaching programs or interventions. Theoretical results about the ICC are provided together with computational strategies. Simulations and a real-world application are presented, and possible future directions are discussed.
{"title":"Bayesian Nonparametric Models for Multiple Raters: A General Statistical Framework.","authors":"Giuseppe Mignemi, Ioanna Manolopoulou","doi":"10.1017/psy.2025.10035","DOIUrl":"10.1017/psy.2025.10035","url":null,"abstract":"<p><p>Rating procedure is crucial in many applied fields (e.g., educational, clinical, emergency). In these contexts, a rater (e.g., teacher, doctor) scores a subject (e.g., student, doctor) on a rating scale. Given raters' variability, several statistical methods have been proposed for assessing and improving the quality of ratings. The analysis and the estimate of the Intraclass Correlation Coefficient (ICC) are major concerns in such cases. As evidenced by the literature, ICC might differ across different subgroups of raters and might be affected by contextual factors and subject heterogeneity. Model estimation in the presence of heterogeneity has been one of the recent challenges in this research line. Consequently, several methods have been proposed to address this issue under a parametric multilevel modelling framework, in which strong distributional assumptions are made. We propose a more flexible model under the Bayesian nonparametric (BNP) framework, in which most of those assumptions are relaxed. By eliciting hierarchical discrete nonparametric priors, the model accommodates clusters among raters and subjects, naturally accounts for heterogeneity, and improves estimates' accuracy. We propose a general BNP heteroscedastic framework to analyze continuous and coarse rating data and possible latent differences among subjects and raters. The estimated densities are used to make inferences about the rating process and the quality of the ratings. By exploiting a stick-breaking representation of the discrete nonparametric priors, a general class of ICC indices might be derived for these models. Our method allows us to independently identify latent similarities between subjects and raters and can be applied in <i>precise education</i> to improve personalized teaching programs or interventions. Theoretical results about the ICC are provided together with computational strategies. Simulations and a real-world application are presented, and possible future directions are discussed.</p>","PeriodicalId":54534,"journal":{"name":"Psychometrika","volume":" ","pages":"1445-1480"},"PeriodicalIF":3.1,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12660027/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144818305","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-09-01Epub Date: 2025-08-07DOI: 10.1017/psy.2025.10037
Youjin Sung, Youngjin Han, Yang Liu
Assessing fit in common factor models solely through the lens of mean and covariance structures, as is commonly done with conventional goodness-of-fit (GOF) assessments, may overlook critical aspects of misfit, potentially leading to misleading conclusions. To achieve more flexible fit assessment, we extend the theory of generalized residuals (Haberman & Sinharay, 2013), originally developed for models with categorical data, to encompass more general measurement models. Within this extended framework, we propose several fit test statistics designed to evaluate various parametric assumptions involved in common factor models. The examples include assessing the distributional assumptions of latent variables and the functional form assumptions of individual manifest variables. The performance of the proposed statistics is examined through simulation studies and an empirical data analysis. Our findings suggest that generalized residuals are promising tools for detecting misfit in measurement models, often masked when assessed by conventional GOF testing methods.
{"title":"A New Fit Assessment Framework for Common Factor Models Using Generalized Residuals.","authors":"Youjin Sung, Youngjin Han, Yang Liu","doi":"10.1017/psy.2025.10037","DOIUrl":"10.1017/psy.2025.10037","url":null,"abstract":"<p><p>Assessing fit in common factor models solely through the lens of mean and covariance structures, as is commonly done with conventional goodness-of-fit (GOF) assessments, may overlook critical aspects of misfit, potentially leading to misleading conclusions. To achieve more flexible fit assessment, we extend the theory of generalized residuals (Haberman & Sinharay, 2013), originally developed for models with categorical data, to encompass more general measurement models. Within this extended framework, we propose several fit test statistics designed to evaluate various parametric assumptions involved in common factor models. The examples include assessing the distributional assumptions of latent variables and the functional form assumptions of individual manifest variables. The performance of the proposed statistics is examined through simulation studies and an empirical data analysis. Our findings suggest that generalized residuals are promising tools for detecting misfit in measurement models, often masked when assessed by conventional GOF testing methods.</p>","PeriodicalId":54534,"journal":{"name":"Psychometrika","volume":" ","pages":"1419-1444"},"PeriodicalIF":3.1,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12660002/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144796128","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}