Psychometrika最新文献_第4页

Bayesian Semiparametric Longitudinal Inverse-Probit Mixed Models for Category Learning. 用于类别学习的贝叶斯半参数纵向反比特混合模型

IF 2.9 2区心理学 Q1 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS

Psychometrika

Pub Date : 2024-06-01 Epub Date: 2024-02-19 DOI: 10.1007/s11336-024-09947-8

Minerva Mukhopadhyay, Jacie R McHaney, Bharath Chandrasekaran, Abhra Sarkar

Understanding how the adult human brain learns novel categories is an important problem in neuroscience. Drift-diffusion models are popular in such contexts for their ability to mimic the underlying neural mechanisms. One such model for gradual longitudinal learning was recently developed in Paulon et al. (J Am Stat Assoc 116:1114-1127, 2021). In practice, category response accuracies are often the only reliable measure recorded by behavioral scientists to describe human learning. Category response accuracies are, however, often the only reliable measure recorded by behavioral scientists to describe human learning. To our knowledge, however, drift-diffusion models for such scenarios have never been considered in the literature before. To address this gap, in this article, we build carefully on Paulon et al. (J Am Stat Assoc 116:1114-1127, 2021), but now with latent response times integrated out, to derive a novel biologically interpretable class of 'inverse-probit' categorical probability models for observed categories alone. However, this new marginal model presents significant identifiability and inferential challenges not encountered originally for the joint model in Paulon et al. (J Am Stat Assoc 116:1114-1127, 2021). We address these new challenges using a novel projection-based approach with a symmetry-preserving identifiability constraint that allows us to work with conjugate priors in an unconstrained space. We adapt the model for group and individual-level inference in longitudinal settings. Building again on the model's latent variable representation, we design an efficient Markov chain Monte Carlo algorithm for posterior computation. We evaluate the empirical performance of the method through simulation experiments. The practical efficacy of the method is illustrated in applications to longitudinal tone learning studies.

了解成人大脑如何学习新类别是神经科学领域的一个重要问题。漂移-扩散模型因其能够模拟潜在的神经机制而在此类研究中颇受欢迎。最近，Paulon 等人建立了这样一个用于渐进纵向学习的模型（J Am Stat Assoc 116:1114-1127, 2021）。在实践中，类别反应准确度往往是行为科学家记录的唯一可靠的描述人类学习的指标。然而，类别反应准确度往往是行为科学家记录的描述人类学习的唯一可靠指标。然而，据我们所知，以前的文献从未考虑过这种情况下的漂移扩散模型。为了填补这一空白，在本文中，我们以 Paulon 等人（J Am Stat Assoc 116:1114-1127, 2021）的研究为基础，将潜在的反应时间整合进来，推导出了一类新的可从生物学角度解释的 "逆边际 "分类概率模型，该模型仅适用于观察到的类别。然而，这种新的边际模型带来了重大的可识别性和推论挑战，而这些挑战是 Paulon 等人（J Am Stat Assoc 116:1114-1127, 2021）的联合模型最初没有遇到过的。我们采用一种新颖的基于投影的方法来应对这些新挑战，该方法具有对称保护的可识别性约束，允许我们在无约束空间中使用共轭先验。我们调整了模型，使其适用于纵向设置中的群体和个体水平推断。我们再次以模型的潜在变量表示为基础，设计了一种高效的马尔科夫链蒙特卡罗算法，用于后验计算。我们通过模拟实验评估了该方法的经验性能。该方法在纵向音调学习研究中的应用说明了它的实际功效。

{"title":"Bayesian Semiparametric Longitudinal Inverse-Probit Mixed Models for Category Learning.","authors":"Minerva Mukhopadhyay, Jacie R McHaney, Bharath Chandrasekaran, Abhra Sarkar","doi":"10.1007/s11336-024-09947-8","DOIUrl":"10.1007/s11336-024-09947-8","url":null,"abstract":"Understanding how the adult human brain learns novel categories is an important problem in neuroscience. Drift-diffusion models are popular in such contexts for their ability to mimic the underlying neural mechanisms. One such model for gradual longitudinal learning was recently developed in Paulon et al. (J Am Stat Assoc 116:1114-1127, 2021). In practice, category response accuracies are often the only reliable measure recorded by behavioral scientists to describe human learning. Category response accuracies are, however, often the only reliable measure recorded by behavioral scientists to describe human learning. To our knowledge, however, drift-diffusion models for such scenarios have never been considered in the literature before. To address this gap, in this article, we build carefully on Paulon et al. (J Am Stat Assoc 116:1114-1127, 2021), but now with latent response times integrated out, to derive a novel biologically interpretable class of 'inverse-probit' categorical probability models for observed categories alone. However, this new marginal model presents significant identifiability and inferential challenges not encountered originally for the joint model in Paulon et al. (J Am Stat Assoc 116:1114-1127, 2021). We address these new challenges using a novel projection-based approach with a symmetry-preserving identifiability constraint that allows us to work with conjugate priors in an unconstrained space. We adapt the model for group and individual-level inference in longitudinal settings. Building again on the model's latent variable representation, we design an efficient Markov chain Monte Carlo algorithm for posterior computation. We evaluate the empirical performance of the method through simulation experiments. The practical efficacy of the method is illustrated in applications to longitudinal tone learning studies.","PeriodicalId":54534,"journal":{"name":"Psychometrika","volume":" ","pages":"461-485"},"PeriodicalIF":2.9,"publicationDate":"2024-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139906887","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

On the Identifiability of 3- and 4-Parameter Item Response Theory Models From the Perspective of Knowledge Space Theory. 从知识空间理论的角度看三参数和四参数项目反应理论模型的可识别性。

IF 2.9 2区心理学 Q1 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS

Psychometrika

Pub Date : 2024-06-01 Epub Date: 2024-02-13 DOI: 10.1007/s11336-024-09950-z

Stefano Noventa, Sangbeak Ye, Augustin Kelava, Andrea Spoto

The present work aims at showing that the identification problems (here meant as both issues of empirical indistinguishability and unidentifiability) of some item response theory models are related to the notion of identifiability in knowledge space theory. Specifically, that the identification problems of the 3- and 4-parameter models are related to the more general issues of forward- and backward-gradedness in all items of the power set, which is the knowledge structure associated with IRT models under the assumption of local independence. As a consequence, the identifiability problem of a 4-parameter model is split into two parts: a first one, which is the result of a trade-off between the left-side added parameters and the remainder of the Item Response Function, e.g., a 2-parameter model, and a second one, which is the already well-known identifiability issue of the 2-parameter model itself. Application of the results to the logistic case appears to provide both a confirmation and a generalization of the current findings in the literature for both fixed- and random-effects IRT logistic models.

本研究旨在说明某些项目反应理论模型的可识别性问题（这里指经验上的不可区分性和不可识别性）与知识空间理论中的可识别性概念有关。具体来说，3参数和4参数模型的可识别性问题与更普遍的幂集所有项目的前向和后向分级问题有关，而幂集是在局部独立假设下与IRT模型相关的知识结构。因此，4 参数模型的可识别性问题被分为两部分：第一部分是左侧添加参数与项目反应函数（如 2 参数模型）其余部分之间权衡的结果；第二部分是 2 参数模型本身的可识别性问题。将这些结果应用于逻辑模型似乎既证实了目前文献中对固定效应和随机效应 IRT 逻辑模型的研究结果，又推广了这些研究结果。

{"title":"On the Identifiability of 3- and 4-Parameter Item Response Theory Models From the Perspective of Knowledge Space Theory.","authors":"Stefano Noventa, Sangbeak Ye, Augustin Kelava, Andrea Spoto","doi":"10.1007/s11336-024-09950-z","DOIUrl":"10.1007/s11336-024-09950-z","url":null,"abstract":"The present work aims at showing that the identification problems (here meant as both issues of empirical indistinguishability and unidentifiability) of some item response theory models are related to the notion of identifiability in knowledge space theory. Specifically, that the identification problems of the 3- and 4-parameter models are related to the more general issues of forward- and backward-gradedness in all items of the power set, which is the knowledge structure associated with IRT models under the assumption of local independence. As a consequence, the identifiability problem of a 4-parameter model is split into two parts: a first one, which is the result of a trade-off between the left-side added parameters and the remainder of the Item Response Function, e.g., a 2-parameter model, and a second one, which is the already well-known identifiability issue of the 2-parameter model itself. Application of the results to the logistic case appears to provide both a confirmation and a generalization of the current findings in the literature for both fixed- and random-effects IRT logistic models.","PeriodicalId":54534,"journal":{"name":"Psychometrika","volume":" ","pages":"486-516"},"PeriodicalIF":2.9,"publicationDate":"2024-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11164782/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139725052","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A Spectral Method for Identifiable Grade of Membership Analysis with Binary Responses. 二元响应的可识别成员等级分析光谱法

IF 2.9 2区心理学 Q1 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS

Psychometrika

Pub Date : 2024-06-01 Epub Date: 2024-02-15 DOI: 10.1007/s11336-024-09951-y

Ling Chen, Yuqi Gu

Grade of membership (GoM) models are popular individual-level mixture models for multivariate categorical data. GoM allows each subject to have mixed memberships in multiple extreme latent profiles. Therefore, GoM models have a richer modeling capacity than latent class models that restrict each subject to belong to a single profile. The flexibility of GoM comes at the cost of more challenging identifiability and estimation problems. In this work, we propose a singular value decomposition (SVD)-based spectral approach to GoM analysis with multivariate binary responses. Our approach hinges on the observation that the expectation of the data matrix has a low-rank decomposition under a GoM model. For identifiability, we develop sufficient and almost necessary conditions for a notion of expectation identifiability. For estimation, we extract only a few leading singular vectors of the observed data matrix and exploit the simplex geometry of these vectors to estimate the mixed membership scores and other parameters. We also establish the consistency of our estimator in the double-asymptotic regime where both the number of subjects and the number of items grow to infinity. Our spectral method has a huge computational advantage over Bayesian or likelihood-based methods and is scalable to large-scale and high-dimensional data. Extensive simulation studies demonstrate the superior efficiency and accuracy of our method. We also illustrate our method by applying it to a personality test dataset.

成员等级（GoM）模型是针对多变量分类数据的流行的个体级混合模型。GoM 模型允许每个研究对象在多个极端潜特征中拥有混合成员资格。因此，GoM 模型比限制每个受试者只属于单一特征的潜类模型具有更丰富的建模能力。GoM 的灵活性是以更具挑战性的可识别性和估计问题为代价的。在这项工作中，我们提出了一种基于奇异值分解（SVD）的频谱方法，用于多变量二元响应的 GoM 分析。在 GoM 模型下，数据矩阵的期望具有低秩分解，而我们的方法正是基于这一观察结果。在可识别性方面，我们提出了期望可识别性概念的充分条件和几乎必要条件。在估计方面，我们只提取观察到的数据矩阵的几个前导奇异向量，并利用这些向量的单纯形几何来估计混合成员得分和其他参数。我们还确定了我们的估计方法在主体数和项数均增长到无穷大的双重渐近机制中的一致性。与贝叶斯方法或基于似然法的方法相比，我们的光谱方法具有巨大的计算优势，可扩展到大规模和高维数据。广泛的模拟研究证明了我们的方法具有卓越的效率和准确性。我们还将我们的方法应用于一个人格测试数据集，以此来说明我们的方法。

{"title":"A Spectral Method for Identifiable Grade of Membership Analysis with Binary Responses.","authors":"Ling Chen, Yuqi Gu","doi":"10.1007/s11336-024-09951-y","DOIUrl":"10.1007/s11336-024-09951-y","url":null,"abstract":"Grade of membership (GoM) models are popular individual-level mixture models for multivariate categorical data. GoM allows each subject to have mixed memberships in multiple extreme latent profiles. Therefore, GoM models have a richer modeling capacity than latent class models that restrict each subject to belong to a single profile. The flexibility of GoM comes at the cost of more challenging identifiability and estimation problems. In this work, we propose a singular value decomposition (SVD)-based spectral approach to GoM analysis with multivariate binary responses. Our approach hinges on the observation that the expectation of the data matrix has a low-rank decomposition under a GoM model. For identifiability, we develop sufficient and almost necessary conditions for a notion of expectation identifiability. For estimation, we extract only a few leading singular vectors of the observed data matrix and exploit the simplex geometry of these vectors to estimate the mixed membership scores and other parameters. We also establish the consistency of our estimator in the double-asymptotic regime where both the number of subjects and the number of items grow to infinity. Our spectral method has a huge computational advantage over Bayesian or likelihood-based methods and is scalable to large-scale and high-dimensional data. Extensive simulation studies demonstrate the superior efficiency and accuracy of our method. We also illustrate our method by applying it to a personality test dataset.","PeriodicalId":54534,"journal":{"name":"Psychometrika","volume":" ","pages":"626-657"},"PeriodicalIF":2.9,"publicationDate":"2024-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139742705","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Efficient Corrections for Standardized Person-Fit Statistics. 标准化人称拟合统计的高效修正。

IF 2.9 2区心理学 Q1 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS

Psychometrika

Pub Date : 2024-06-01 Epub Date: 2024-04-01 DOI: 10.1007/s11336-024-09960-x

Kylie Gorney, Sandip Sinharay, Carol Eckerly

Many popular person-fit statistics belong to the class of standardized person-fit statistics, T, and are assumed to have a standard normal null distribution. However, in practice, this assumption is incorrect since T is computed using (a) an estimated ability parameter and (b) a finite number of items. Snijders (Psychometrika 66(3):331-342, 2001) developed mean and variance corrections for T to account for the use of an estimated ability parameter. Bedrick (Psychometrika 62(2):191-199, 1997) and Molenaar and Hoijtink (Psychometrika 55(1):75-106, 1990) developed skewness corrections for T to account for the use of a finite number of items. In this paper, we combine these two lines of research and propose three new corrections for T that simultaneously account for the use of an estimated ability parameter and the use of a finite number of items. The new corrections are efficient in that they only require the analysis of the original data set and do not require the simulation or analysis of any additional data sets. We conducted a detailed simulation study and found that the new corrections are able to control the Type I error rate while also maintaining reasonable levels of power. A real data example is also included.

许多常用的人称拟合统计量都属于标准化人称拟合统计量 T，并假定其具有标准正态空分布。然而，在实践中，这一假设是不正确的，因为 T 是使用（a）估计的能力参数和（b）有限数量的项目计算得出的。Snijders（《心理测量学》第 66（3）期：331-342，2001 年）对 T 进行了均值和方差修正，以考虑估计能力参数的使用。Bedrick（Psychometrika 62（2）：191-199，1997）和 Molenaar 与 Hoijtink（Psychometrika 55（1）：75-106，1990）对 T 进行了偏度修正，以考虑有限项目数的使用。在本文中，我们将这两项研究结合起来，提出了三种新的 T 修正方法，同时考虑了估计能力参数的使用和有限项目数的使用。新的修正方法非常有效，因为它们只需要分析原始数据集，而不需要模拟或分析任何额外的数据集。我们进行了详细的模拟研究，发现新的修正方法既能控制 I 类错误率，又能保持合理的功率水平。我们还提供了一个真实数据示例。

{"title":"Efficient Corrections for Standardized Person-Fit Statistics.","authors":"Kylie Gorney, Sandip Sinharay, Carol Eckerly","doi":"10.1007/s11336-024-09960-x","DOIUrl":"10.1007/s11336-024-09960-x","url":null,"abstract":"Many popular person-fit statistics belong to the class of standardized person-fit statistics, T, and are assumed to have a standard normal null distribution. However, in practice, this assumption is incorrect since T is computed using (a) an estimated ability parameter and (b) a finite number of items. Snijders (Psychometrika 66(3):331-342, 2001) developed mean and variance corrections for T to account for the use of an estimated ability parameter. Bedrick (Psychometrika 62(2):191-199, 1997) and Molenaar and Hoijtink (Psychometrika 55(1):75-106, 1990) developed skewness corrections for T to account for the use of a finite number of items. In this paper, we combine these two lines of research and propose three new corrections for T that simultaneously account for the use of an estimated ability parameter and the use of a finite number of items. The new corrections are efficient in that they only require the analysis of the original data set and do not require the simulation or analysis of any additional data sets. We conducted a detailed simulation study and found that the new corrections are able to control the Type I error rate while also maintaining reasonable levels of power. A real data example is also included.","PeriodicalId":54534,"journal":{"name":"Psychometrika","volume":" ","pages":"569-591"},"PeriodicalIF":2.9,"publicationDate":"2024-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140337679","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

What Can We Learn from a Semiparametric Factor Analysis of Item Responses and Response Time? An Illustration with the PISA 2015 Data. 从项目反应和反应时间的半参数因子分析中我们可以学到什么?2015年PISA数据说明。

IF 2.9 2区心理学 Q1 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS

Psychometrika

Pub Date : 2024-06-01 Epub Date: 2023-11-16 DOI: 10.1007/s11336-023-09936-3

Yang Liu, Weimeng Wang

It is widely believed that a joint factor analysis of item responses and response time (RT) may yield more precise ability scores that are conventionally predicted from responses only. For this purpose, a simple-structure factor model is often preferred as it only requires specifying an additional measurement model for item-level RT while leaving the original item response theory (IRT) model for responses intact. The added speed factor indicated by item-level RT correlates with the ability factor in the IRT model, allowing RT data to carry additional information about respondents' ability. However, parametric simple-structure factor models are often restrictive and fit poorly to empirical data, which prompts under-confidence in the suitablity of a simple factor structure. In the present paper, we analyze the 2015 Programme for International Student Assessment mathematics data using a semiparametric simple-structure model. We conclude that a simple factor structure attains a decent fit after further parametric assumptions in the measurement model are sufficiently relaxed. Furthermore, our semiparametric model implies that the association between latent ability and speed/slowness is strong in the population, but the form of association is nonlinear. It follows that scoring based on the fitted model can substantially improve the precision of ability scores.

人们普遍认为，项目反应和反应时间(RT)的联合因素分析可能会产生更精确的能力分数，而不是传统的仅从反应中预测。为此，简单结构因素模型通常是首选的，因为它只需要为项目级RT指定一个额外的测量模型，而保留原始的项目反应理论(IRT)模型。项目级RT表示的附加速度因子与IRT模型中的能力因子相关，允许RT数据携带有关被调查者能力的附加信息。然而，参数化的简单结构因子模型往往具有限制性，对经验数据的拟合效果较差，从而导致对简单因子结构的适用性缺乏信心。在本文中，我们使用半参数简单结构模型分析了2015年国际学生评估计划的数学数据。我们得出的结论是，在测量模型中的进一步参数假设充分放松后，简单的因素结构获得了良好的拟合。此外，我们的半参数模型表明，潜在能力与速度/慢度之间的关联在总体中很强，但这种关联的形式是非线性的。由此可见，基于拟合模型的评分可以大大提高能力评分的精度。

{"title":"What Can We Learn from a Semiparametric Factor Analysis of Item Responses and Response Time? An Illustration with the PISA 2015 Data.","authors":"Yang Liu, Weimeng Wang","doi":"10.1007/s11336-023-09936-3","DOIUrl":"10.1007/s11336-023-09936-3","url":null,"abstract":"It is widely believed that a joint factor analysis of item responses and response time (RT) may yield more precise ability scores that are conventionally predicted from responses only. For this purpose, a simple-structure factor model is often preferred as it only requires specifying an additional measurement model for item-level RT while leaving the original item response theory (IRT) model for responses intact. The added speed factor indicated by item-level RT correlates with the ability factor in the IRT model, allowing RT data to carry additional information about respondents' ability. However, parametric simple-structure factor models are often restrictive and fit poorly to empirical data, which prompts under-confidence in the suitablity of a simple factor structure. In the present paper, we analyze the 2015 Programme for International Student Assessment mathematics data using a semiparametric simple-structure model. We conclude that a simple factor structure attains a decent fit after further parametric assumptions in the measurement model are sufficiently relaxed. Furthermore, our semiparametric model implies that the association between latent ability and speed/slowness is strong in the population, but the form of association is nonlinear. It follows that scoring based on the fitted model can substantially improve the precision of ability scores.","PeriodicalId":54534,"journal":{"name":"Psychometrika","volume":" ","pages":"386-410"},"PeriodicalIF":2.9,"publicationDate":"2024-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136400355","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A Diagnostic Facet Status Model (DFSM) for Extracting Instructionally Useful Information from Diagnostic Assessment 从诊断评估中提取对教学有用信息的诊断面状态模型（DFSM）

IF 3 2区心理学 Q1 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS

Psychometrika

Pub Date : 2024-04-28 DOI: 10.1007/s11336-024-09971-8

Chun Wang

Modern assessment demands, resulting from educational reform efforts, call for strengthening diagnostic testing capabilities to identify not only the understanding of expected learning goals but also related intermediate understandings that are steppingstones on pathways to learning goals. An accurate and nuanced way of interpreting assessment results will allow subsequent instructional actions to be targeted. An appropriate psychometric model is indispensable in this regard. In this study, we developed a new psychometric model, namely, the diagnostic facet status model (DFSM), which belongs to the general class of cognitive diagnostic models (CDM), but with two notable features: (1) it simultaneously models students’ target understanding (i.e., goal facet) and intermediate understanding (i.e., intermediate facet); and (2) it models every response option, rather than merely right or wrong responses, so that each incorrect response uniquely contributes to discovering students’ facet status. Given that some combination of goal and intermediate facets may be impossible due to facet hierarchical relationships, a regularized expectation–maximization algorithm (REM) was developed for model estimation. A log-penalty was imposed on the mixing proportions to encourage sparsity. As a result, those impermissible latent classes had estimated mixing proportions equal to 0. A heuristic algorithm was proposed to infer a facet map from the estimated permissible classes. A simulation study was conducted to evaluate the performance of REM to recover facet model parameters and to identify permissible latent classes. A real data analysis was provided to show the feasibility of the model.

教育改革带来的现代评估需求要求加强诊断测试能力，不仅要确定对预期学习目标的理解，还要确定相关的中间理解，这些中间理解是通向学习目标的阶梯。对评估结果进行准确而细致的解释，可以使随后的教学行动有的放矢。在这方面，一个适当的心理测量模型是必不可少的。在本研究中，我们开发了一种新的心理测量模型，即诊断面状态模型（DFSM），它属于认知诊断模型（CDM）的一般类别，但有两个显著特点：(1) 它同时对学生的目标理解（即目标面）和中间理解（即中间面）进行建模；(2) 它对每一个回答选项进行建模，而不仅仅是对或错的回答，因此每一个错误的回答都会对发现学生的面状态做出独特的贡献。由于面的层次关系，目标面和中间面的某些组合可能是不可能的，因此我们开发了一种正则化期望最大化算法（REM）来进行模型估计。为鼓励稀疏性，对混合比例施加了对数惩罚。因此，不允许的潜类的估计混合比例等于 0。我们进行了一项模拟研究，以评估 REM 在恢复面模型参数和识别允许潜类方面的性能。提供的真实数据分析显示了该模型的可行性。

{"title":"A Diagnostic Facet Status Model (DFSM) for Extracting Instructionally Useful Information from Diagnostic Assessment","authors":"Chun Wang","doi":"10.1007/s11336-024-09971-8","DOIUrl":"https://doi.org/10.1007/s11336-024-09971-8","url":null,"abstract":"Modern assessment demands, resulting from educational reform efforts, call for strengthening diagnostic testing capabilities to identify not only the understanding of expected learning goals but also related intermediate understandings that are steppingstones on pathways to learning goals. An accurate and nuanced way of interpreting assessment results will allow subsequent instructional actions to be targeted. An appropriate psychometric model is indispensable in this regard. In this study, we developed a new psychometric model, namely, the diagnostic facet status model (DFSM), which belongs to the general class of cognitive diagnostic models (CDM), but with two notable features: (1) it simultaneously models students’ target understanding (i.e., goal facet) and intermediate understanding (i.e., intermediate facet); and (2) it models every response option, rather than merely right or wrong responses, so that each incorrect response uniquely contributes to discovering students’ facet status. Given that some combination of goal and intermediate facets may be impossible due to facet hierarchical relationships, a regularized expectation–maximization algorithm (REM) was developed for model estimation. A log-penalty was imposed on the mixing proportions to encourage sparsity. As a result, those impermissible latent classes had estimated mixing proportions equal to 0. A heuristic algorithm was proposed to infer a facet map from the estimated permissible classes. A simulation study was conducted to evaluate the performance of REM to recover facet model parameters and to identify permissible latent classes. A real data analysis was provided to show the feasibility of the model.","PeriodicalId":54534,"journal":{"name":"Psychometrika","volume":"52 1","pages":""},"PeriodicalIF":3.0,"publicationDate":"2024-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140809970","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Post-selection Inference in Multiverse Analysis (PIMA): An Inferential Framework Based on the Sign Flipping Score Test 多元宇宙分析中的后选推理（PIMA）：基于符号翻转分数检验的推论框架

IF 3 2区心理学 Q1 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS

Psychometrika

Pub Date : 2024-04-25 DOI: 10.1007/s11336-024-09973-6

Paolo Girardi, Anna Vesely, Daniël Lakens, Gianmarco Altoè, Massimiliano Pastore, Antonio Calcagnì, Livio Finos

When analyzing data, researchers make some choices that are either arbitrary, based on subjective beliefs about the data-generating process, or for which equally justifiable alternative choices could have been made. This wide range of data-analytic choices can be abused and has been one of the underlying causes of the replication crisis in several fields. Recently, the introduction of multiverse analysis provides researchers with a method to evaluate the stability of the results across reasonable choices that could be made when analyzing data. Multiverse analysis is confined to a descriptive role, lacking a proper and comprehensive inferential procedure. Recently, specification curve analysis adds an inferential procedure to multiverse analysis, but this approach is limited to simple cases related to the linear model, and only allows researchers to infer whether at least one specification rejects the null hypothesis, but not which specifications should be selected. In this paper, we present a Post-selection Inference approach to Multiverse Analysis (PIMA) which is a flexible and general inferential approach that considers for all possible models, i.e., the multiverse of reasonable analyses. The approach allows for a wide range of data specifications (i.e., preprocessing) and any generalized linear model; it allows testing the null hypothesis that a given predictor is not associated with the outcome, by combining information from all reasonable models of multiverse analysis, and provides strong control of the family-wise error rate allowing researchers to claim that the null hypothesis can be rejected for any specification that shows a significant effect. The inferential proposal is based on a conditional resampling procedure. We formally prove that the Type I error rate is controlled, and compute the statistical power of the test through a simulation study. Finally, we apply the PIMA procedure to the analysis of a real dataset on the self-reported hesitancy for the COronaVIrus Disease 2019 (COVID-19) vaccine before and after the 2020 lockdown in Italy. We conclude with practical recommendations to be considered when implementing the proposed procedure.

在分析数据时，研究人员会根据对数据生成过程的主观看法做出一些武断的选择，或者做出同样合理的替代选择。这种广泛的数据分析选择可能会被滥用，这也是多个领域出现复制危机的根本原因之一。最近，多元宇宙分析的引入为研究人员提供了一种方法，用于评估在分析数据时可以做出的各种合理选择的结果的稳定性。多元宇宙分析仅限于描述性作用，缺乏适当而全面的推论程序。最近，规范曲线分析为多元宇宙分析增加了一种推论程序，但这种方法仅限于与线性模型相关的简单情况，只能让研究人员推断是否至少有一种规范拒绝零假设，而不能推断应选择哪种规范。在本文中，我们提出了一种多重宇宙分析的后选择推理方法（PIMA），它是一种灵活而通用的推理方法，可考虑所有可能的模型，即合理分析的多重宇宙。该方法适用于各种数据规格（即预处理）和任何广义线性模型；它可以通过结合多元宇宙分析中所有合理模型的信息，检验给定预测因子与结果无关的零假设，并提供对族向误差率的有力控制，使研究人员可以声称，对于任何显示显著效果的规格，都可以拒绝零假设。推论建议基于条件重采样程序。我们正式证明了 I 类错误率是可控的，并通过模拟研究计算了检验的统计功率。最后，我们将 PIMA 程序应用于分析一个真实数据集，该数据集涉及意大利在 2020 年封锁之前和之后对 2019 年 COronaVIrus 病（COVID-19）疫苗的自我报告犹豫不决。最后，我们提出了在实施拟议程序时应考虑的实用建议。

{"title":"Post-selection Inference in Multiverse Analysis (PIMA): An Inferential Framework Based on the Sign Flipping Score Test","authors":"Paolo Girardi, Anna Vesely, Daniël Lakens, Gianmarco Altoè, Massimiliano Pastore, Antonio Calcagnì, Livio Finos","doi":"10.1007/s11336-024-09973-6","DOIUrl":"https://doi.org/10.1007/s11336-024-09973-6","url":null,"abstract":"When analyzing data, researchers make some choices that are either arbitrary, based on subjective beliefs about the data-generating process, or for which equally justifiable alternative choices could have been made. This wide range of data-analytic choices can be abused and has been one of the underlying causes of the replication crisis in several fields. Recently, the introduction of multiverse analysis provides researchers with a method to evaluate the stability of the results across reasonable choices that could be made when analyzing data. Multiverse analysis is confined to a descriptive role, lacking a proper and comprehensive inferential procedure. Recently, specification curve analysis adds an inferential procedure to multiverse analysis, but this approach is limited to simple cases related to the linear model, and only allows researchers to infer whether at least one specification rejects the null hypothesis, but not which specifications should be selected. In this paper, we present a Post-selection Inference approach to Multiverse Analysis (PIMA) which is a flexible and general inferential approach that considers for all possible models, i.e., the multiverse of reasonable analyses. The approach allows for a wide range of data specifications (i.e., preprocessing) and any generalized linear model; it allows testing the null hypothesis that a given predictor is not associated with the outcome, by combining information from all reasonable models of multiverse analysis, and provides strong control of the family-wise error rate allowing researchers to claim that the null hypothesis can be rejected for any specification that shows a significant effect. The inferential proposal is based on a conditional resampling procedure. We formally prove that the Type I error rate is controlled, and compute the statistical power of the test through a simulation study. Finally, we apply the PIMA procedure to the analysis of a real dataset on the self-reported hesitancy for the COronaVIrus Disease 2019 (COVID-19) vaccine before and after the 2020 lockdown in Italy. We conclude with practical recommendations to be considered when implementing the proposed procedure.","PeriodicalId":54534,"journal":{"name":"Psychometrika","volume":"9 1","pages":""},"PeriodicalIF":3.0,"publicationDate":"2024-04-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140798688","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Extended Asymptotic Identifiability of Nonparametric Item Response Models 非参数项目反应模型的扩展渐近可识别性

IF 3 2区心理学 Q1 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS

Psychometrika

Pub Date : 2024-04-24 DOI: 10.1007/s11336-024-09972-7

Yinqiu He

Nonparametric item response models provide a flexible framework in psychological and educational measurements. Douglas (Psychometrika 66(4):531–540, 2001) established asymptotic identifiability for a class of models with nonparametric response functions for long assessments. Nevertheless, the model class examined in Douglas (2001) excludes several popular parametric item response models. This limitation can hinder the applications in which nonparametric and parametric models are compared, such as evaluating model goodness-of-fit. To address this issue, We consider an extended nonparametric model class that encompasses most parametric models and establish asymptotic identifiability. The results bridge the parametric and nonparametric item response models and provide a solid theoretical foundation for the applications of nonparametric item response models for assessments with many items.

非参数项目反应模型为心理和教育测量提供了一个灵活的框架。道格拉斯（Psychometrika 66（4）：531-540，2001 年）为一类具有非参数响应函数的长期评估模型建立了渐近可识别性。然而，道格拉斯（2001）所研究的模型类别不包括几种流行的参数项目反应模型。这一限制可能会妨碍对非参数模型和参数模型进行比较的应用，如评估模型的拟合优度。为了解决这个问题，我们考虑了一个扩展的非参数模型类别，它包含了大多数参数模型，并建立了渐近可识别性。这些结果是参数和非参数项目反应模型的桥梁，为非参数项目反应模型在多项目评估中的应用提供了坚实的理论基础。

引用次数: 0

Recognize the Value of the Sum Score, Psychometrics’ Greatest Accomplishment 认识总分的价值，心理测量学的最大成就

IF 3 2区心理学 Q1 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS

Psychometrika

Pub Date : 2024-04-17 DOI: 10.1007/s11336-024-09964-7

Klaas Sijtsma, Jules L. Ellis, Denny Borsboom

The sum score on a psychological test is, and should continue to be, a tool central in psychometric practice. This position runs counter to several psychometricians’ belief that the sum score represents a pre-scientific conception that must be abandoned from psychometrics in favor of latent variables. First, we reiterate that the sum score stochastically orders the latent variable in a wide variety of much-used item response models. In fact, item response theory provides a mathematically based justification for the ordinal use of the sum score. Second, because discussions about the sum score often involve its reliability and estimation methods as well, we show that, based on very general assumptions, classical test theory provides a family of lower bounds several of which are close to the true reliability under reasonable conditions. Finally, we argue that eventually sum scores derive their value from the degree to which they enable predicting practically relevant events and behaviors. None of our discussion is meant to discredit modern measurement models; they have their own merits unattainable for classical test theory, but the latter model provides impressive contributions to psychometrics based on very few assumptions that seem to have become obscured in the past few decades. Their generality and practical usefulness add to the accomplishments of more recent approaches.

心理测验的总分是，而且应该继续是，心理测量实践中的核心工具。有几位心理测量学家认为，总分是一种前科学概念，必须从心理测量学中摒弃，转而使用潜变量。首先，我们要重申，在各种常用的项目反应模型中，总分是随机排列潜变量的。事实上，项目反应理论为总分的顺序使用提供了数学上的依据。其次，由于有关总分的讨论往往还涉及其信度和估计方法，我们证明，基于非常一般的假设，经典测验理论提供了一系列下限，其中有几个在合理条件下接近真实信度。最后，我们认为，最终总分的价值来自于它们能够预测实际相关事件和行为的程度。我们的讨论无意诋毁现代测量模型；它们有自己的优点，是经典测验理论无法企及的，但后者基于极少的假设为心理测量学做出了令人印象深刻的贡献，而这些假设在过去几十年中似乎变得模糊不清了。这些模型的通用性和实用性为最新方法的成就锦上添花。

{"title":"Recognize the Value of the Sum Score, Psychometrics’ Greatest Accomplishment","authors":"Klaas Sijtsma, Jules L. Ellis, Denny Borsboom","doi":"10.1007/s11336-024-09964-7","DOIUrl":"https://doi.org/10.1007/s11336-024-09964-7","url":null,"abstract":"The sum score on a psychological test is, and should continue to be, a tool central in psychometric practice. This position runs counter to several psychometricians’ belief that the sum score represents a pre-scientific conception that must be abandoned from psychometrics in favor of latent variables. First, we reiterate that the sum score stochastically orders the latent variable in a wide variety of much-used item response models. In fact, item response theory provides a mathematically based justification for the ordinal use of the sum score. Second, because discussions about the sum score often involve its reliability and estimation methods as well, we show that, based on very general assumptions, classical test theory provides a family of lower bounds several of which are close to the true reliability under reasonable conditions. Finally, we argue that eventually sum scores derive their value from the degree to which they enable predicting practically relevant events and behaviors. None of our discussion is meant to discredit modern measurement models; they have their own merits unattainable for classical test theory, but the latter model provides impressive contributions to psychometrics based on very few assumptions that seem to have become obscured in the past few decades. Their generality and practical usefulness add to the accomplishments of more recent approaches.","PeriodicalId":54534,"journal":{"name":"Psychometrika","volume":"302 1","pages":""},"PeriodicalIF":3.0,"publicationDate":"2024-04-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140612805","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Parallel Optimal Calibration of Mixed-Format Items for Achievement Tests 成就测试混合格式项目的平行优化校准

IF 3 2区心理学 Q1 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS

Psychometrika

Pub Date : 2024-04-15 DOI: 10.1007/s11336-024-09968-3

Frank Miller, Ellinor Fackle-Fornius

When large achievement tests are conducted regularly, items need to be calibrated before being used as operational items in a test. Methods have been developed to optimally assign pretest items to examinees based on their abilities. Most of these methods, however, are intended for situations where examinees arrive sequentially to be assigned to calibration items. In several calibration tests, examinees take the test simultaneously or in parallel. In this article, we develop an optimal calibration design tailored for such parallel test setups. Our objective is both to investigate the efficiency gain of the method as well as to demonstrate that this method can be implemented in real calibration scenarios. For the latter, we have employed this method to calibrate items for the Swedish national tests in Mathematics. In this case study, like in many real test situations, items are of mixed format and the optimal design method needs to handle that. The method we propose works for mixed-format tests and accounts for varying expected response times. Our investigations show that the proposed method considerably enhances calibration efficiency.

在定期进行大型成绩测验时，需要对测验项目进行校准，然后才能将其用作测验中的操作项目。目前已经开发出了一些方法，可以根据考生的能力来为他们分配最佳的测前项目。然而，这些方法大多适用于考生依次到达考场，被分配到校准项目的情况。在一些校准测试中，考生会同时或平行参加测试。在本文中，我们为这种平行测试设置开发了一种最佳校准设计。我们的目的既是为了研究该方法的效率增益，也是为了证明该方法可在实际校准场景中实施。对于后者，我们采用了这种方法来校准瑞典国家数学测试的项目。在这个案例研究中，就像在许多真实的测试环境中一样，题目是混合格式的，优化设计方法需要处理这种情况。我们提出的方法适用于混合形式的测试，并考虑到了不同的预期反应时间。我们的研究表明，所提出的方法大大提高了校准效率。

引用次数: 0