Pub Date : 2024-06-01Epub Date: 2024-02-19DOI: 10.1007/s11336-024-09947-8
Minerva Mukhopadhyay, Jacie R McHaney, Bharath Chandrasekaran, Abhra Sarkar
Understanding how the adult human brain learns novel categories is an important problem in neuroscience. Drift-diffusion models are popular in such contexts for their ability to mimic the underlying neural mechanisms. One such model for gradual longitudinal learning was recently developed in Paulon et al. (J Am Stat Assoc 116:1114-1127, 2021). In practice, category response accuracies are often the only reliable measure recorded by behavioral scientists to describe human learning. Category response accuracies are, however, often the only reliable measure recorded by behavioral scientists to describe human learning. To our knowledge, however, drift-diffusion models for such scenarios have never been considered in the literature before. To address this gap, in this article, we build carefully on Paulon et al. (J Am Stat Assoc 116:1114-1127, 2021), but now with latent response times integrated out, to derive a novel biologically interpretable class of 'inverse-probit' categorical probability models for observed categories alone. However, this new marginal model presents significant identifiability and inferential challenges not encountered originally for the joint model in Paulon et al. (J Am Stat Assoc 116:1114-1127, 2021). We address these new challenges using a novel projection-based approach with a symmetry-preserving identifiability constraint that allows us to work with conjugate priors in an unconstrained space. We adapt the model for group and individual-level inference in longitudinal settings. Building again on the model's latent variable representation, we design an efficient Markov chain Monte Carlo algorithm for posterior computation. We evaluate the empirical performance of the method through simulation experiments. The practical efficacy of the method is illustrated in applications to longitudinal tone learning studies.
了解成人大脑如何学习新类别是神经科学领域的一个重要问题。漂移-扩散模型因其能够模拟潜在的神经机制而在此类研究中颇受欢迎。最近,Paulon 等人建立了这样一个用于渐进纵向学习的模型(J Am Stat Assoc 116:1114-1127, 2021)。在实践中,类别反应准确度往往是行为科学家记录的唯一可靠的描述人类学习的指标。然而,类别反应准确度往往是行为科学家记录的描述人类学习的唯一可靠指标。然而,据我们所知,以前的文献从未考虑过这种情况下的漂移扩散模型。为了填补这一空白,在本文中,我们以 Paulon 等人(J Am Stat Assoc 116:1114-1127, 2021)的研究为基础,将潜在的反应时间整合进来,推导出了一类新的可从生物学角度解释的 "逆边际 "分类概率模型,该模型仅适用于观察到的类别。然而,这种新的边际模型带来了重大的可识别性和推论挑战,而这些挑战是 Paulon 等人(J Am Stat Assoc 116:1114-1127, 2021)的联合模型最初没有遇到过的。我们采用一种新颖的基于投影的方法来应对这些新挑战,该方法具有对称保护的可识别性约束,允许我们在无约束空间中使用共轭先验。我们调整了模型,使其适用于纵向设置中的群体和个体水平推断。我们再次以模型的潜在变量表示为基础,设计了一种高效的马尔科夫链蒙特卡罗算法,用于后验计算。我们通过模拟实验评估了该方法的经验性能。该方法在纵向音调学习研究中的应用说明了它的实际功效。
{"title":"Bayesian Semiparametric Longitudinal Inverse-Probit Mixed Models for Category Learning.","authors":"Minerva Mukhopadhyay, Jacie R McHaney, Bharath Chandrasekaran, Abhra Sarkar","doi":"10.1007/s11336-024-09947-8","DOIUrl":"10.1007/s11336-024-09947-8","url":null,"abstract":"<p><p>Understanding how the adult human brain learns novel categories is an important problem in neuroscience. Drift-diffusion models are popular in such contexts for their ability to mimic the underlying neural mechanisms. One such model for gradual longitudinal learning was recently developed in Paulon et al. (J Am Stat Assoc 116:1114-1127, 2021). In practice, category response accuracies are often the only reliable measure recorded by behavioral scientists to describe human learning. Category response accuracies are, however, often the only reliable measure recorded by behavioral scientists to describe human learning. To our knowledge, however, drift-diffusion models for such scenarios have never been considered in the literature before. To address this gap, in this article, we build carefully on Paulon et al. (J Am Stat Assoc 116:1114-1127, 2021), but now with latent response times integrated out, to derive a novel biologically interpretable class of 'inverse-probit' categorical probability models for observed categories alone. However, this new marginal model presents significant identifiability and inferential challenges not encountered originally for the joint model in Paulon et al. (J Am Stat Assoc 116:1114-1127, 2021). We address these new challenges using a novel projection-based approach with a symmetry-preserving identifiability constraint that allows us to work with conjugate priors in an unconstrained space. We adapt the model for group and individual-level inference in longitudinal settings. Building again on the model's latent variable representation, we design an efficient Markov chain Monte Carlo algorithm for posterior computation. We evaluate the empirical performance of the method through simulation experiments. The practical efficacy of the method is illustrated in applications to longitudinal tone learning studies.</p>","PeriodicalId":54534,"journal":{"name":"Psychometrika","volume":" ","pages":"461-485"},"PeriodicalIF":2.9,"publicationDate":"2024-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139906887","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-06-01Epub Date: 2024-02-13DOI: 10.1007/s11336-024-09950-z
Stefano Noventa, Sangbeak Ye, Augustin Kelava, Andrea Spoto
The present work aims at showing that the identification problems (here meant as both issues of empirical indistinguishability and unidentifiability) of some item response theory models are related to the notion of identifiability in knowledge space theory. Specifically, that the identification problems of the 3- and 4-parameter models are related to the more general issues of forward- and backward-gradedness in all items of the power set, which is the knowledge structure associated with IRT models under the assumption of local independence. As a consequence, the identifiability problem of a 4-parameter model is split into two parts: a first one, which is the result of a trade-off between the left-side added parameters and the remainder of the Item Response Function, e.g., a 2-parameter model, and a second one, which is the already well-known identifiability issue of the 2-parameter model itself. Application of the results to the logistic case appears to provide both a confirmation and a generalization of the current findings in the literature for both fixed- and random-effects IRT logistic models.
{"title":"On the Identifiability of 3- and 4-Parameter Item Response Theory Models From the Perspective of Knowledge Space Theory.","authors":"Stefano Noventa, Sangbeak Ye, Augustin Kelava, Andrea Spoto","doi":"10.1007/s11336-024-09950-z","DOIUrl":"10.1007/s11336-024-09950-z","url":null,"abstract":"<p><p>The present work aims at showing that the identification problems (here meant as both issues of empirical indistinguishability and unidentifiability) of some item response theory models are related to the notion of identifiability in knowledge space theory. Specifically, that the identification problems of the 3- and 4-parameter models are related to the more general issues of forward- and backward-gradedness in all items of the power set, which is the knowledge structure associated with IRT models under the assumption of local independence. As a consequence, the identifiability problem of a 4-parameter model is split into two parts: a first one, which is the result of a trade-off between the left-side added parameters and the remainder of the Item Response Function, e.g., a 2-parameter model, and a second one, which is the already well-known identifiability issue of the 2-parameter model itself. Application of the results to the logistic case appears to provide both a confirmation and a generalization of the current findings in the literature for both fixed- and random-effects IRT logistic models.</p>","PeriodicalId":54534,"journal":{"name":"Psychometrika","volume":" ","pages":"486-516"},"PeriodicalIF":2.9,"publicationDate":"2024-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11164782/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139725052","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-06-01Epub Date: 2024-02-15DOI: 10.1007/s11336-024-09951-y
Ling Chen, Yuqi Gu
Grade of membership (GoM) models are popular individual-level mixture models for multivariate categorical data. GoM allows each subject to have mixed memberships in multiple extreme latent profiles. Therefore, GoM models have a richer modeling capacity than latent class models that restrict each subject to belong to a single profile. The flexibility of GoM comes at the cost of more challenging identifiability and estimation problems. In this work, we propose a singular value decomposition (SVD)-based spectral approach to GoM analysis with multivariate binary responses. Our approach hinges on the observation that the expectation of the data matrix has a low-rank decomposition under a GoM model. For identifiability, we develop sufficient and almost necessary conditions for a notion of expectation identifiability. For estimation, we extract only a few leading singular vectors of the observed data matrix and exploit the simplex geometry of these vectors to estimate the mixed membership scores and other parameters. We also establish the consistency of our estimator in the double-asymptotic regime where both the number of subjects and the number of items grow to infinity. Our spectral method has a huge computational advantage over Bayesian or likelihood-based methods and is scalable to large-scale and high-dimensional data. Extensive simulation studies demonstrate the superior efficiency and accuracy of our method. We also illustrate our method by applying it to a personality test dataset.
{"title":"A Spectral Method for Identifiable Grade of Membership Analysis with Binary Responses.","authors":"Ling Chen, Yuqi Gu","doi":"10.1007/s11336-024-09951-y","DOIUrl":"10.1007/s11336-024-09951-y","url":null,"abstract":"<p><p>Grade of membership (GoM) models are popular individual-level mixture models for multivariate categorical data. GoM allows each subject to have mixed memberships in multiple extreme latent profiles. Therefore, GoM models have a richer modeling capacity than latent class models that restrict each subject to belong to a single profile. The flexibility of GoM comes at the cost of more challenging identifiability and estimation problems. In this work, we propose a singular value decomposition (SVD)-based spectral approach to GoM analysis with multivariate binary responses. Our approach hinges on the observation that the expectation of the data matrix has a low-rank decomposition under a GoM model. For identifiability, we develop sufficient and almost necessary conditions for a notion of expectation identifiability. For estimation, we extract only a few leading singular vectors of the observed data matrix and exploit the simplex geometry of these vectors to estimate the mixed membership scores and other parameters. We also establish the consistency of our estimator in the double-asymptotic regime where both the number of subjects and the number of items grow to infinity. Our spectral method has a huge computational advantage over Bayesian or likelihood-based methods and is scalable to large-scale and high-dimensional data. Extensive simulation studies demonstrate the superior efficiency and accuracy of our method. We also illustrate our method by applying it to a personality test dataset.</p>","PeriodicalId":54534,"journal":{"name":"Psychometrika","volume":" ","pages":"626-657"},"PeriodicalIF":2.9,"publicationDate":"2024-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139742705","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-06-01Epub Date: 2024-04-01DOI: 10.1007/s11336-024-09960-x
Kylie Gorney, Sandip Sinharay, Carol Eckerly
Many popular person-fit statistics belong to the class of standardized person-fit statistics, T, and are assumed to have a standard normal null distribution. However, in practice, this assumption is incorrect since T is computed using (a) an estimated ability parameter and (b) a finite number of items. Snijders (Psychometrika 66(3):331-342, 2001) developed mean and variance corrections for T to account for the use of an estimated ability parameter. Bedrick (Psychometrika 62(2):191-199, 1997) and Molenaar and Hoijtink (Psychometrika 55(1):75-106, 1990) developed skewness corrections for T to account for the use of a finite number of items. In this paper, we combine these two lines of research and propose three new corrections for T that simultaneously account for the use of an estimated ability parameter and the use of a finite number of items. The new corrections are efficient in that they only require the analysis of the original data set and do not require the simulation or analysis of any additional data sets. We conducted a detailed simulation study and found that the new corrections are able to control the Type I error rate while also maintaining reasonable levels of power. A real data example is also included.
许多常用的人称拟合统计量都属于标准化人称拟合统计量 T,并假定其具有标准正态空分布。然而,在实践中,这一假设是不正确的,因为 T 是使用(a)估计的能力参数和(b)有限数量的项目计算得出的。Snijders(《心理测量学》第 66(3)期:331-342,2001 年)对 T 进行了均值和方差修正,以考虑估计能力参数的使用。Bedrick(Psychometrika 62(2):191-199,1997)和 Molenaar 与 Hoijtink(Psychometrika 55(1):75-106,1990)对 T 进行了偏度修正,以考虑有限项目数的使用。在本文中,我们将这两项研究结合起来,提出了三种新的 T 修正方法,同时考虑了估计能力参数的使用和有限项目数的使用。新的修正方法非常有效,因为它们只需要分析原始数据集,而不需要模拟或分析任何额外的数据集。我们进行了详细的模拟研究,发现新的修正方法既能控制 I 类错误率,又能保持合理的功率水平。我们还提供了一个真实数据示例。
{"title":"Efficient Corrections for Standardized Person-Fit Statistics.","authors":"Kylie Gorney, Sandip Sinharay, Carol Eckerly","doi":"10.1007/s11336-024-09960-x","DOIUrl":"10.1007/s11336-024-09960-x","url":null,"abstract":"<p><p>Many popular person-fit statistics belong to the class of standardized person-fit statistics, T, and are assumed to have a standard normal null distribution. However, in practice, this assumption is incorrect since T is computed using (a) an estimated ability parameter and (b) a finite number of items. Snijders (Psychometrika 66(3):331-342, 2001) developed mean and variance corrections for T to account for the use of an estimated ability parameter. Bedrick (Psychometrika 62(2):191-199, 1997) and Molenaar and Hoijtink (Psychometrika 55(1):75-106, 1990) developed skewness corrections for T to account for the use of a finite number of items. In this paper, we combine these two lines of research and propose three new corrections for T that simultaneously account for the use of an estimated ability parameter and the use of a finite number of items. The new corrections are efficient in that they only require the analysis of the original data set and do not require the simulation or analysis of any additional data sets. We conducted a detailed simulation study and found that the new corrections are able to control the Type I error rate while also maintaining reasonable levels of power. A real data example is also included.</p>","PeriodicalId":54534,"journal":{"name":"Psychometrika","volume":" ","pages":"569-591"},"PeriodicalIF":2.9,"publicationDate":"2024-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140337679","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-06-01Epub Date: 2023-11-16DOI: 10.1007/s11336-023-09936-3
Yang Liu, Weimeng Wang
It is widely believed that a joint factor analysis of item responses and response time (RT) may yield more precise ability scores that are conventionally predicted from responses only. For this purpose, a simple-structure factor model is often preferred as it only requires specifying an additional measurement model for item-level RT while leaving the original item response theory (IRT) model for responses intact. The added speed factor indicated by item-level RT correlates with the ability factor in the IRT model, allowing RT data to carry additional information about respondents' ability. However, parametric simple-structure factor models are often restrictive and fit poorly to empirical data, which prompts under-confidence in the suitablity of a simple factor structure. In the present paper, we analyze the 2015 Programme for International Student Assessment mathematics data using a semiparametric simple-structure model. We conclude that a simple factor structure attains a decent fit after further parametric assumptions in the measurement model are sufficiently relaxed. Furthermore, our semiparametric model implies that the association between latent ability and speed/slowness is strong in the population, but the form of association is nonlinear. It follows that scoring based on the fitted model can substantially improve the precision of ability scores.
{"title":"What Can We Learn from a Semiparametric Factor Analysis of Item Responses and Response Time? An Illustration with the PISA 2015 Data.","authors":"Yang Liu, Weimeng Wang","doi":"10.1007/s11336-023-09936-3","DOIUrl":"10.1007/s11336-023-09936-3","url":null,"abstract":"<p><p>It is widely believed that a joint factor analysis of item responses and response time (RT) may yield more precise ability scores that are conventionally predicted from responses only. For this purpose, a simple-structure factor model is often preferred as it only requires specifying an additional measurement model for item-level RT while leaving the original item response theory (IRT) model for responses intact. The added speed factor indicated by item-level RT correlates with the ability factor in the IRT model, allowing RT data to carry additional information about respondents' ability. However, parametric simple-structure factor models are often restrictive and fit poorly to empirical data, which prompts under-confidence in the suitablity of a simple factor structure. In the present paper, we analyze the 2015 Programme for International Student Assessment mathematics data using a semiparametric simple-structure model. We conclude that a simple factor structure attains a decent fit after further parametric assumptions in the measurement model are sufficiently relaxed. Furthermore, our semiparametric model implies that the association between latent ability and speed/slowness is strong in the population, but the form of association is nonlinear. It follows that scoring based on the fitted model can substantially improve the precision of ability scores.</p>","PeriodicalId":54534,"journal":{"name":"Psychometrika","volume":" ","pages":"386-410"},"PeriodicalIF":2.9,"publicationDate":"2024-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136400355","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-04-28DOI: 10.1007/s11336-024-09971-8
Chun Wang
Modern assessment demands, resulting from educational reform efforts, call for strengthening diagnostic testing capabilities to identify not only the understanding of expected learning goals but also related intermediate understandings that are steppingstones on pathways to learning goals. An accurate and nuanced way of interpreting assessment results will allow subsequent instructional actions to be targeted. An appropriate psychometric model is indispensable in this regard. In this study, we developed a new psychometric model, namely, the diagnostic facet status model (DFSM), which belongs to the general class of cognitive diagnostic models (CDM), but with two notable features: (1) it simultaneously models students’ target understanding (i.e., goal facet) and intermediate understanding (i.e., intermediate facet); and (2) it models every response option, rather than merely right or wrong responses, so that each incorrect response uniquely contributes to discovering students’ facet status. Given that some combination of goal and intermediate facets may be impossible due to facet hierarchical relationships, a regularized expectation–maximization algorithm (REM) was developed for model estimation. A log-penalty was imposed on the mixing proportions to encourage sparsity. As a result, those impermissible latent classes had estimated mixing proportions equal to 0. A heuristic algorithm was proposed to infer a facet map from the estimated permissible classes. A simulation study was conducted to evaluate the performance of REM to recover facet model parameters and to identify permissible latent classes. A real data analysis was provided to show the feasibility of the model.
教育改革带来的现代评估需求要求加强诊断测试能力,不仅要确定对预期学习目标 的理解,还要确定相关的中间理解,这些中间理解是通向学习目标的阶梯。对评估结果进行准确而细致的解释,可以使随后的教学行动有的放矢。在这方面,一个适当的心理测量模型是必不可少的。在本研究中,我们开发了一种新的心理测量模型,即诊断面状态模型(DFSM),它属于认知诊断模型(CDM)的一般类别,但有两个显著特点:(1) 它同时对学生的目标理解(即目标面)和中间理解(即中间面)进行建模;(2) 它对每一个回答选项进行建模,而不仅仅是对或错的回答,因此每一个错误的回答都会对发现学生的面状态做出独特的贡献。由于面的层次关系,目标面和中间面的某些组合可能是不可能的,因此我们开发了一种正则化期望最大化算法(REM)来进行模型估计。为鼓励稀疏性,对混合比例施加了对数惩罚。因此,不允许的潜类的估计混合比例等于 0。我们进行了一项模拟研究,以评估 REM 在恢复面模型参数和识别允许潜类方面的性能。提供的真实数据分析显示了该模型的可行性。
{"title":"A Diagnostic Facet Status Model (DFSM) for Extracting Instructionally Useful Information from Diagnostic Assessment","authors":"Chun Wang","doi":"10.1007/s11336-024-09971-8","DOIUrl":"https://doi.org/10.1007/s11336-024-09971-8","url":null,"abstract":"<p>Modern assessment demands, resulting from educational reform efforts, call for strengthening diagnostic testing capabilities to identify not only the understanding of expected learning goals but also related intermediate understandings that are steppingstones on pathways to learning goals. An accurate and nuanced way of interpreting assessment results will allow subsequent instructional actions to be targeted. An appropriate psychometric model is indispensable in this regard. In this study, we developed a new psychometric model, namely, the diagnostic facet status model (DFSM), which belongs to the general class of cognitive diagnostic models (CDM), but with two notable features: (1) it simultaneously models students’ target understanding (i.e., goal facet) and intermediate understanding (i.e., intermediate facet); and (2) it models every response option, rather than merely right or wrong responses, so that each incorrect response uniquely contributes to discovering students’ facet status. Given that some combination of goal and intermediate facets may be impossible due to facet hierarchical relationships, a regularized expectation–maximization algorithm (REM) was developed for model estimation. A log-penalty was imposed on the mixing proportions to encourage sparsity. As a result, those impermissible latent classes had estimated mixing proportions equal to 0. A heuristic algorithm was proposed to infer a facet map from the estimated permissible classes. A simulation study was conducted to evaluate the performance of REM to recover facet model parameters and to identify permissible latent classes. A real data analysis was provided to show the feasibility of the model.</p>","PeriodicalId":54534,"journal":{"name":"Psychometrika","volume":"52 1","pages":""},"PeriodicalIF":3.0,"publicationDate":"2024-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140809970","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-04-25DOI: 10.1007/s11336-024-09973-6
Paolo Girardi, Anna Vesely, Daniël Lakens, Gianmarco Altoè, Massimiliano Pastore, Antonio Calcagnì, Livio Finos
When analyzing data, researchers make some choices that are either arbitrary, based on subjective beliefs about the data-generating process, or for which equally justifiable alternative choices could have been made. This wide range of data-analytic choices can be abused and has been one of the underlying causes of the replication crisis in several fields. Recently, the introduction of multiverse analysis provides researchers with a method to evaluate the stability of the results across reasonable choices that could be made when analyzing data. Multiverse analysis is confined to a descriptive role, lacking a proper and comprehensive inferential procedure. Recently, specification curve analysis adds an inferential procedure to multiverse analysis, but this approach is limited to simple cases related to the linear model, and only allows researchers to infer whether at least one specification rejects the null hypothesis, but not which specifications should be selected. In this paper, we present a Post-selection Inference approach to Multiverse Analysis (PIMA) which is a flexible and general inferential approach that considers for all possible models, i.e., the multiverse of reasonable analyses. The approach allows for a wide range of data specifications (i.e., preprocessing) and any generalized linear model; it allows testing the null hypothesis that a given predictor is not associated with the outcome, by combining information from all reasonable models of multiverse analysis, and provides strong control of the family-wise error rate allowing researchers to claim that the null hypothesis can be rejected for any specification that shows a significant effect. The inferential proposal is based on a conditional resampling procedure. We formally prove that the Type I error rate is controlled, and compute the statistical power of the test through a simulation study. Finally, we apply the PIMA procedure to the analysis of a real dataset on the self-reported hesitancy for the COronaVIrus Disease 2019 (COVID-19) vaccine before and after the 2020 lockdown in Italy. We conclude with practical recommendations to be considered when implementing the proposed procedure.
{"title":"Post-selection Inference in Multiverse Analysis (PIMA): An Inferential Framework Based on the Sign Flipping Score Test","authors":"Paolo Girardi, Anna Vesely, Daniël Lakens, Gianmarco Altoè, Massimiliano Pastore, Antonio Calcagnì, Livio Finos","doi":"10.1007/s11336-024-09973-6","DOIUrl":"https://doi.org/10.1007/s11336-024-09973-6","url":null,"abstract":"<p>When analyzing data, researchers make some choices that are either arbitrary, based on subjective beliefs about the data-generating process, or for which equally justifiable alternative choices could have been made. This wide range of data-analytic choices can be abused and has been one of the underlying causes of the replication crisis in several fields. Recently, the introduction of multiverse analysis provides researchers with a method to evaluate the stability of the results across reasonable choices that could be made when analyzing data. Multiverse analysis is confined to a descriptive role, lacking a proper and comprehensive inferential procedure. Recently, specification curve analysis adds an inferential procedure to multiverse analysis, but this approach is limited to simple cases related to the linear model, and only allows researchers to infer whether at least one specification rejects the null hypothesis, but not which specifications should be selected. In this paper, we present a Post-selection Inference approach to Multiverse Analysis (PIMA) which is a flexible and general inferential approach that considers for all possible models, i.e., the multiverse of reasonable analyses. The approach allows for a wide range of data specifications (i.e., preprocessing) and any generalized linear model; it allows testing the null hypothesis that a given predictor is not associated with the outcome, by combining information from all reasonable models of multiverse analysis, and provides strong control of the family-wise error rate allowing researchers to claim that the null hypothesis can be rejected for any specification that shows a significant effect. The inferential proposal is based on a conditional resampling procedure. We formally prove that the Type I error rate is controlled, and compute the statistical power of the test through a simulation study. Finally, we apply the PIMA procedure to the analysis of a real dataset on the self-reported hesitancy for the COronaVIrus Disease 2019 (COVID-19) vaccine before and after the 2020 lockdown in Italy. We conclude with practical recommendations to be considered when implementing the proposed procedure.</p>","PeriodicalId":54534,"journal":{"name":"Psychometrika","volume":"9 1","pages":""},"PeriodicalIF":3.0,"publicationDate":"2024-04-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140798688","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-04-24DOI: 10.1007/s11336-024-09972-7
Yinqiu He
Nonparametric item response models provide a flexible framework in psychological and educational measurements. Douglas (Psychometrika 66(4):531–540, 2001) established asymptotic identifiability for a class of models with nonparametric response functions for long assessments. Nevertheless, the model class examined in Douglas (2001) excludes several popular parametric item response models. This limitation can hinder the applications in which nonparametric and parametric models are compared, such as evaluating model goodness-of-fit. To address this issue, We consider an extended nonparametric model class that encompasses most parametric models and establish asymptotic identifiability. The results bridge the parametric and nonparametric item response models and provide a solid theoretical foundation for the applications of nonparametric item response models for assessments with many items.
{"title":"Extended Asymptotic Identifiability of Nonparametric Item Response Models","authors":"Yinqiu He","doi":"10.1007/s11336-024-09972-7","DOIUrl":"https://doi.org/10.1007/s11336-024-09972-7","url":null,"abstract":"<p>Nonparametric item response models provide a flexible framework in psychological and educational measurements. Douglas (Psychometrika 66(4):531–540, 2001) established asymptotic identifiability for a class of models with nonparametric response functions for long assessments. Nevertheless, the model class examined in Douglas (2001) excludes several popular parametric item response models. This limitation can hinder the applications in which nonparametric and parametric models are compared, such as evaluating model goodness-of-fit. To address this issue, We consider an extended nonparametric model class that encompasses most parametric models and establish asymptotic identifiability. The results bridge the parametric and nonparametric item response models and provide a solid theoretical foundation for the applications of nonparametric item response models for assessments with many items.</p>","PeriodicalId":54534,"journal":{"name":"Psychometrika","volume":"11 1","pages":""},"PeriodicalIF":3.0,"publicationDate":"2024-04-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140798668","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-04-17DOI: 10.1007/s11336-024-09964-7
Klaas Sijtsma, Jules L. Ellis, Denny Borsboom
The sum score on a psychological test is, and should continue to be, a tool central in psychometric practice. This position runs counter to several psychometricians’ belief that the sum score represents a pre-scientific conception that must be abandoned from psychometrics in favor of latent variables. First, we reiterate that the sum score stochastically orders the latent variable in a wide variety of much-used item response models. In fact, item response theory provides a mathematically based justification for the ordinal use of the sum score. Second, because discussions about the sum score often involve its reliability and estimation methods as well, we show that, based on very general assumptions, classical test theory provides a family of lower bounds several of which are close to the true reliability under reasonable conditions. Finally, we argue that eventually sum scores derive their value from the degree to which they enable predicting practically relevant events and behaviors. None of our discussion is meant to discredit modern measurement models; they have their own merits unattainable for classical test theory, but the latter model provides impressive contributions to psychometrics based on very few assumptions that seem to have become obscured in the past few decades. Their generality and practical usefulness add to the accomplishments of more recent approaches.
{"title":"Recognize the Value of the Sum Score, Psychometrics’ Greatest Accomplishment","authors":"Klaas Sijtsma, Jules L. Ellis, Denny Borsboom","doi":"10.1007/s11336-024-09964-7","DOIUrl":"https://doi.org/10.1007/s11336-024-09964-7","url":null,"abstract":"<p>The sum score on a psychological test is, and should continue to be, a tool central in psychometric practice. This position runs counter to several psychometricians’ belief that the sum score represents a pre-scientific conception that must be abandoned from psychometrics in favor of latent variables. First, we reiterate that the sum score stochastically orders the latent variable in a wide variety of much-used item response models. In fact, item response theory provides a mathematically based justification for the ordinal use of the sum score. Second, because discussions about the sum score often involve its reliability and estimation methods as well, we show that, based on very general assumptions, classical test theory provides a family of lower bounds several of which are close to the true reliability under reasonable conditions. Finally, we argue that eventually sum scores derive their value from the degree to which they enable predicting practically relevant events and behaviors. None of our discussion is meant to discredit modern measurement models; they have their own merits unattainable for classical test theory, but the latter model provides impressive contributions to psychometrics based on very few assumptions that seem to have become obscured in the past few decades. Their generality and practical usefulness add to the accomplishments of more recent approaches.</p>","PeriodicalId":54534,"journal":{"name":"Psychometrika","volume":"302 1","pages":""},"PeriodicalIF":3.0,"publicationDate":"2024-04-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140612805","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-04-15DOI: 10.1007/s11336-024-09968-3
Frank Miller, Ellinor Fackle-Fornius
When large achievement tests are conducted regularly, items need to be calibrated before being used as operational items in a test. Methods have been developed to optimally assign pretest items to examinees based on their abilities. Most of these methods, however, are intended for situations where examinees arrive sequentially to be assigned to calibration items. In several calibration tests, examinees take the test simultaneously or in parallel. In this article, we develop an optimal calibration design tailored for such parallel test setups. Our objective is both to investigate the efficiency gain of the method as well as to demonstrate that this method can be implemented in real calibration scenarios. For the latter, we have employed this method to calibrate items for the Swedish national tests in Mathematics. In this case study, like in many real test situations, items are of mixed format and the optimal design method needs to handle that. The method we propose works for mixed-format tests and accounts for varying expected response times. Our investigations show that the proposed method considerably enhances calibration efficiency.
{"title":"Parallel Optimal Calibration of Mixed-Format Items for Achievement Tests","authors":"Frank Miller, Ellinor Fackle-Fornius","doi":"10.1007/s11336-024-09968-3","DOIUrl":"https://doi.org/10.1007/s11336-024-09968-3","url":null,"abstract":"<p>When large achievement tests are conducted regularly, items need to be calibrated before being used as operational items in a test. Methods have been developed to optimally assign pretest items to examinees based on their abilities. Most of these methods, however, are intended for situations where examinees arrive sequentially to be assigned to calibration items. In several calibration tests, examinees take the test simultaneously or in parallel. In this article, we develop an optimal calibration design tailored for such parallel test setups. Our objective is both to investigate the efficiency gain of the method as well as to demonstrate that this method can be implemented in real calibration scenarios. For the latter, we have employed this method to calibrate items for the Swedish national tests in Mathematics. In this case study, like in many real test situations, items are of mixed format and the optimal design method needs to handle that. The method we propose works for mixed-format tests and accounts for varying expected response times. Our investigations show that the proposed method considerably enhances calibration efficiency.</p>","PeriodicalId":54534,"journal":{"name":"Psychometrika","volume":"10 1","pages":""},"PeriodicalIF":3.0,"publicationDate":"2024-04-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140569132","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}