首页 > 最新文献

Psychometrika最新文献

英文 中文
Optimizing Large-Scale Educational Assessment with a "Divide-and-Conquer" Strategy: Fast and Efficient Distributed Bayesian Inference in IRT Models. 用 "分而治之 "策略优化大规模教育评估:快速高效的 IRT 模型分布式贝叶斯推理。
IF 2.9 2区 心理学 Q1 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS Pub Date : 2024-12-01 Epub Date: 2024-05-30 DOI: 10.1007/s11336-024-09978-1
Sainan Xu, Jing Lu, Jiwei Zhang, Chun Wang, Gongjun Xu

With the growing attention on large-scale educational testing and assessment, the ability to process substantial volumes of response data becomes crucial. Current estimation methods within item response theory (IRT), despite their high precision, often pose considerable computational burdens with large-scale data, leading to reduced computational speed. This study introduces a novel "divide- and-conquer" parallel algorithm built on the Wasserstein posterior approximation concept, aiming to enhance computational speed while maintaining accurate parameter estimation. This algorithm enables drawing parameters from segmented data subsets in parallel, followed by an amalgamation of these parameters via Wasserstein posterior approximation. Theoretical support for the algorithm is established through asymptotic optimality under certain regularity assumptions. Practical validation is demonstrated using real-world data from the Programme for International Student Assessment. Ultimately, this research proposes a transformative approach to managing educational big data, offering a scalable, efficient, and precise alternative that promises to redefine traditional practices in educational assessments.

随着大规模教育测试和评估日益受到关注,处理大量反应数据的能力变得至关重要。目前项目反应理论(IRT)中的估计方法尽管精度很高,但在处理大规模数据时往往会带来相当大的计算负担,导致计算速度下降。本研究介绍了一种基于 Wasserstein 后验近似概念的新型 "分而治之 "并行算法,旨在提高计算速度的同时保持准确的参数估计。该算法可以并行地从分段数据子集中提取参数,然后通过瓦瑟斯坦后验近似合并这些参数。在一定的规则性假设下,通过渐近最优性为该算法提供了理论支持。利用国际学生评估项目的真实数据进行了实际验证。最终,这项研究提出了一种管理教育大数据的变革方法,提供了一种可扩展、高效和精确的替代方案,有望重新定义教育评估的传统做法。
{"title":"Optimizing Large-Scale Educational Assessment with a \"Divide-and-Conquer\" Strategy: Fast and Efficient Distributed Bayesian Inference in IRT Models.","authors":"Sainan Xu, Jing Lu, Jiwei Zhang, Chun Wang, Gongjun Xu","doi":"10.1007/s11336-024-09978-1","DOIUrl":"10.1007/s11336-024-09978-1","url":null,"abstract":"<p><p>With the growing attention on large-scale educational testing and assessment, the ability to process substantial volumes of response data becomes crucial. Current estimation methods within item response theory (IRT), despite their high precision, often pose considerable computational burdens with large-scale data, leading to reduced computational speed. This study introduces a novel \"divide- and-conquer\" parallel algorithm built on the Wasserstein posterior approximation concept, aiming to enhance computational speed while maintaining accurate parameter estimation. This algorithm enables drawing parameters from segmented data subsets in parallel, followed by an amalgamation of these parameters via Wasserstein posterior approximation. Theoretical support for the algorithm is established through asymptotic optimality under certain regularity assumptions. Practical validation is demonstrated using real-world data from the Programme for International Student Assessment. Ultimately, this research proposes a transformative approach to managing educational big data, offering a scalable, efficient, and precise alternative that promises to redefine traditional practices in educational assessments.</p>","PeriodicalId":54534,"journal":{"name":"Psychometrika","volume":" ","pages":"1119-1147"},"PeriodicalIF":2.9,"publicationDate":"2024-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141176735","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Are Sum Scores a Great Accomplishment of Psychometrics or Intuitive Test Theory? 总分是心理测量学还是直觉测验理论的伟大成就?
IF 2.9 2区 心理学 Q1 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS Pub Date : 2024-12-01 Epub Date: 2024-10-22 DOI: 10.1007/s11336-024-10003-8
Robert J Mislevy

Sijtsma, Ellis, and Borsboom (Psychometrika, 89:84-117, 2024. https://doi.org/10.1007/s11336-024-09964-7 ) provide a thoughtful treatment in Psychometrika of the value and properties of sum scores and classical test theory at a depth at which few practicing psychometricians are familiar. In this note, I offer comments on their article from the perspective of evidentiary reasoning.

Sijtsma、Ellis 和 Borsboom (Psychometrika, 89:84-117, 2024. https://doi.org/10.1007/s11336-024-09964-7 ) 在《心理测量学》上对总分的价值和属性以及经典测验理论进行了深入的探讨,很少有实践心理测量学家会对这些内容感到熟悉。在本说明中,我将从证据推理的角度对他们的文章发表评论。
{"title":"Are Sum Scores a Great Accomplishment of Psychometrics or Intuitive Test Theory?","authors":"Robert J Mislevy","doi":"10.1007/s11336-024-10003-8","DOIUrl":"10.1007/s11336-024-10003-8","url":null,"abstract":"<p><p>Sijtsma, Ellis, and Borsboom (Psychometrika, 89:84-117, 2024. https://doi.org/10.1007/s11336-024-09964-7 ) provide a thoughtful treatment in Psychometrika of the value and properties of sum scores and classical test theory at a depth at which few practicing psychometricians are familiar. In this note, I offer comments on their article from the perspective of evidentiary reasoning.</p>","PeriodicalId":54534,"journal":{"name":"Psychometrika","volume":" ","pages":"1170-1174"},"PeriodicalIF":2.9,"publicationDate":"2024-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142481089","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
New Paradigm of Identifiable General-response Cognitive Diagnostic Models: Beyond Categorical Data. 可识别的一般反应认知诊断模型新范例:超越分类数据
IF 2.9 2区 心理学 Q1 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS Pub Date : 2024-12-01 Epub Date: 2024-07-05 DOI: 10.1007/s11336-024-09983-4
Seunghyun Lee, Yuqi Gu

Cognitive diagnostic models (CDMs) are a popular family of discrete latent variable models that model students' mastery or deficiency of multiple fine-grained skills. CDMs have been most widely used to model categorical item response data such as binary or polytomous responses. With advances in technology and the emergence of varying test formats in modern educational assessments, new response types, including continuous responses such as response times, and count-valued responses from tests with repetitive tasks or eye-tracking sensors, have also become available. Variants of CDMs have been proposed recently for modeling such responses. However, whether these extended CDMs are identifiable and estimable is entirely unknown. We propose a very general cognitive diagnostic modeling framework for arbitrary types of multivariate responses with minimal assumptions, and establish identifiability in this general setting. Surprisingly, we prove that our general-response CDMs are identifiable under Q -matrix-based conditions similar to those for traditional categorical-response CDMs. Our conclusions set up a new paradigm of identifiable general-response CDMs. We propose an EM algorithm to efficiently estimate a broad class of exponential family-based general-response CDMs. We conduct simulation studies under various response types. The simulation results not only corroborate our identifiability theory, but also demonstrate the superior empirical performance of our estimation algorithms. We illustrate our methodology by applying it to a TIMSS 2019 response time dataset.

认知诊断模型(CDM)是一种流行的离散潜变量模型,用于模拟学生掌握或缺乏多种精细技能的情况。认知诊断模型最广泛地应用于对二元或多态响应等分类项目响应数据建模。随着技术的进步和现代教育评估中不同测试形式的出现,新的反应类型也已出现,包括连续反应(如反应时间)和来自重复任务或眼动传感器测试的计数值反应。最近有人提出了 CDM 的变体,用于对这些反应建模。然而,这些扩展的 CDM 是否可以识别和估算还完全未知。我们为任意类型的多变量反应提出了一个非常通用的认知诊断建模框架,假设条件极少,并在这一通用环境中建立了可识别性。令人惊讶的是,我们证明了我们的一般反应 CDM 在基于 Q 矩阵的条件下是可识别的,这与传统分类反应 CDM 的条件相似。我们的结论为可识别的一般响应 CDM 树立了一个新范例。我们提出了一种 EM 算法,用于有效估计一大类基于指数族的一般响应 CDM。我们对各种反应类型进行了模拟研究。模拟结果不仅证实了我们的可识别性理论,还证明了我们的估计算法具有卓越的经验性能。我们将我们的方法应用于 TIMSS 2019 反应时间数据集,以说明我们的方法。
{"title":"New Paradigm of Identifiable General-response Cognitive Diagnostic Models: Beyond Categorical Data.","authors":"Seunghyun Lee, Yuqi Gu","doi":"10.1007/s11336-024-09983-4","DOIUrl":"10.1007/s11336-024-09983-4","url":null,"abstract":"<p><p>Cognitive diagnostic models (CDMs) are a popular family of discrete latent variable models that model students' mastery or deficiency of multiple fine-grained skills. CDMs have been most widely used to model categorical item response data such as binary or polytomous responses. With advances in technology and the emergence of varying test formats in modern educational assessments, new response types, including continuous responses such as response times, and count-valued responses from tests with repetitive tasks or eye-tracking sensors, have also become available. Variants of CDMs have been proposed recently for modeling such responses. However, whether these extended CDMs are identifiable and estimable is entirely unknown. We propose a very general cognitive diagnostic modeling framework for arbitrary types of multivariate responses with minimal assumptions, and establish identifiability in this general setting. Surprisingly, we prove that our general-response CDMs are identifiable under <math><mi>Q</mi></math> -matrix-based conditions similar to those for traditional categorical-response CDMs. Our conclusions set up a new paradigm of identifiable general-response CDMs. We propose an EM algorithm to efficiently estimate a broad class of exponential family-based general-response CDMs. We conduct simulation studies under various response types. The simulation results not only corroborate our identifiability theory, but also demonstrate the superior empirical performance of our estimation algorithms. We illustrate our methodology by applying it to a TIMSS 2019 response time dataset.</p>","PeriodicalId":54534,"journal":{"name":"Psychometrika","volume":" ","pages":"1304-1336"},"PeriodicalIF":2.9,"publicationDate":"2024-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141535981","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Ordinal Outcome State-Space Models for Intensive Longitudinal Data. 用于密集纵向数据的序数结果状态空间模型。
IF 2.9 2区 心理学 Q1 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS Pub Date : 2024-12-01 Epub Date: 2024-06-11 DOI: 10.1007/s11336-024-09984-3
Teague R Henry, Lindley R Slipetz, Ami Falk, Jiaxing Qiu, Meng Chen

Intensive longitudinal (IL) data are increasingly prevalent in psychological science, coinciding with technological advancements that make it simple to deploy study designs such as daily diary and ecological momentary assessments. IL data are characterized by a rapid rate of data collection (1+ collections per day), over a period of time, allowing for the capture of the dynamics that underlie psychological and behavioral processes. One powerful framework for analyzing IL data is state-space modeling, where observed variables are considered measurements for underlying states (i.e., latent variables) that change together over time. However, state-space modeling has typically relied on continuous measurements, whereas psychological data often come in the form of ordinal measurements such as Likert scale items. In this manuscript, we develop a general estimation approach for state-space models with ordinal measurements, specifically focusing on a graded response model for Likert scale items. We evaluate the performance of our model and estimator against that of the commonly used "linear approximation" model, which treats ordinal measurements as though they are continuous. We find that our model resulted in unbiased estimates of the state dynamics, while the linear approximation resulted in strongly biased estimates of the state dynamics. Finally, we develop an approximate standard error, termed slice standard errors and show that these approximate standard errors are more liberal than true standard errors (i.e., smaller) at a consistent bias.

密集纵向(IL)数据在心理科学中日益盛行,与此同时,技术的进步使日常日记和生态瞬间评估等研究设计的部署变得简单。纵向数据的特点是在一段时间内快速收集数据(每天收集 1 次以上),从而捕捉到心理和行为过程的动态变化。状态空间建模是分析 IL 数据的一个强大框架,其中观察变量被视为随时间变化的潜在状态(即潜在变量)的测量值。然而,状态空间建模通常依赖于连续测量,而心理数据通常采用李克特量表项目等序数测量形式。在本手稿中,我们为具有顺序测量的状态空间模型开发了一种通用估算方法,尤其侧重于李克特量表项目的分级反应模型。我们评估了我们的模型和估计方法与常用的 "线性近似 "模型的性能,后者将序数测量视为连续测量。我们发现,我们的模型对状态动态的估计没有偏差,而线性近似模型对状态动态的估计偏差很大。最后,我们提出了一种近似标准误差,称为切片标准误差,并证明在偏差一致的情况下,这些近似标准误差比真实标准误差更宽松(即更小)。
{"title":"Ordinal Outcome State-Space Models for Intensive Longitudinal Data.","authors":"Teague R Henry, Lindley R Slipetz, Ami Falk, Jiaxing Qiu, Meng Chen","doi":"10.1007/s11336-024-09984-3","DOIUrl":"10.1007/s11336-024-09984-3","url":null,"abstract":"<p><p>Intensive longitudinal (IL) data are increasingly prevalent in psychological science, coinciding with technological advancements that make it simple to deploy study designs such as daily diary and ecological momentary assessments. IL data are characterized by a rapid rate of data collection (1+ collections per day), over a period of time, allowing for the capture of the dynamics that underlie psychological and behavioral processes. One powerful framework for analyzing IL data is state-space modeling, where observed variables are considered measurements for underlying states (i.e., latent variables) that change together over time. However, state-space modeling has typically relied on continuous measurements, whereas psychological data often come in the form of ordinal measurements such as Likert scale items. In this manuscript, we develop a general estimation approach for state-space models with ordinal measurements, specifically focusing on a graded response model for Likert scale items. We evaluate the performance of our model and estimator against that of the commonly used \"linear approximation\" model, which treats ordinal measurements as though they are continuous. We find that our model resulted in unbiased estimates of the state dynamics, while the linear approximation resulted in strongly biased estimates of the state dynamics. Finally, we develop an approximate standard error, termed slice standard errors and show that these approximate standard errors are more liberal than true standard errors (i.e., smaller) at a consistent bias.</p>","PeriodicalId":54534,"journal":{"name":"Psychometrika","volume":" ","pages":"1203-1229"},"PeriodicalIF":2.9,"publicationDate":"2024-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11582181/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141302095","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Reliability Theory for Measurements with Variable Test Length, Illustrated with ERN and Pe Collected in the Flanker Task. 测试长度可变的测量可靠性理论,以ERN和侧翼任务中收集的Pe为例。
IF 2.9 2区 心理学 Q1 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS Pub Date : 2024-12-01 Epub Date: 2024-07-21 DOI: 10.1007/s11336-024-09982-5
Jules L Ellis, Klaas Sijtsma, Kristel de Groot, Patrick J F Groenen

In psychophysiology, an interesting question is how to estimate the reliability of event-related potentials collected by means of the Eriksen Flanker Task or similar tests. A special problem presents itself if the data represent neurological reactions that are associated with some responses (in case of the Flanker Task, responding incorrectly on a trial) but not others (like when providing a correct response), inherently resulting in unequal numbers of observations per subject. The general trend in reliability research here is to use generalizability theory and Bayesian estimation. We show that a new approach based on classical test theory and frequentist estimation can do the job as well and in a simpler way, and even provides additional insight to matters that were unsolved in the generalizability method approach. One of our contributions is the definition of a single, overall reliability coefficient for an entire group of subjects with unequal numbers of observations. Both methods have slightly different objectives. We argue in favor of the classical approach but without rejecting the generalizability approach.

在心理生理学中,一个有趣的问题是如何估计通过埃里克森侧手任务或类似测试收集到的事件相关电位的可靠性。如果数据所代表的神经反应与某些反应相关(在弗兰克尔任务中,与试验中的错误反应相关),而与其他反应无关(如提供正确反应时),那么就会出现一个特殊的问题,即每个受试者的观察次数不等。可靠性研究的总体趋势是使用泛化理论和贝叶斯估计法。我们的研究表明,一种基于经典检验理论和频数估计的新方法能以更简单的方式完成这项工作,甚至还能对广义方法中尚未解决的问题提供更多的见解。我们的贡献之一是为观察次数不等的整组受试者定义了单一的总体信度系数。两种方法的目标略有不同。我们支持经典方法,但并不否定广义方法。
{"title":"Reliability Theory for Measurements with Variable Test Length, Illustrated with ERN and Pe Collected in the Flanker Task.","authors":"Jules L Ellis, Klaas Sijtsma, Kristel de Groot, Patrick J F Groenen","doi":"10.1007/s11336-024-09982-5","DOIUrl":"10.1007/s11336-024-09982-5","url":null,"abstract":"<p><p>In psychophysiology, an interesting question is how to estimate the reliability of event-related potentials collected by means of the Eriksen Flanker Task or similar tests. A special problem presents itself if the data represent neurological reactions that are associated with some responses (in case of the Flanker Task, responding incorrectly on a trial) but not others (like when providing a correct response), inherently resulting in unequal numbers of observations per subject. The general trend in reliability research here is to use generalizability theory and Bayesian estimation. We show that a new approach based on classical test theory and frequentist estimation can do the job as well and in a simpler way, and even provides additional insight to matters that were unsolved in the generalizability method approach. One of our contributions is the definition of a single, overall reliability coefficient for an entire group of subjects with unequal numbers of observations. Both methods have slightly different objectives. We argue in favor of the classical approach but without rejecting the generalizability approach.</p>","PeriodicalId":54534,"journal":{"name":"Psychometrika","volume":" ","pages":"1280-1303"},"PeriodicalIF":2.9,"publicationDate":"2024-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11582099/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141735703","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Note on Ising Network Analysis with Missing Data. 关于缺失数据的 Ising 网络分析的说明。
IF 2.9 2区 心理学 Q1 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS Pub Date : 2024-12-01 Epub Date: 2024-07-06 DOI: 10.1007/s11336-024-09985-2
Siliang Zhang, Yunxiao Chen

The Ising model has become a popular psychometric model for analyzing item response data. The statistical inference of the Ising model is typically carried out via a pseudo-likelihood, as the standard likelihood approach suffers from a high computational cost when there are many variables (i.e., items). Unfortunately, the presence of missing values can hinder the use of pseudo-likelihood, and a listwise deletion approach for missing data treatment may introduce a substantial bias into the estimation and sometimes yield misleading interpretations. This paper proposes a conditional Bayesian framework for Ising network analysis with missing data, which integrates a pseudo-likelihood approach with iterative data imputation. An asymptotic theory is established for the method. Furthermore, a computationally efficient Pólya-Gamma data augmentation procedure is proposed to streamline the sampling of model parameters. The method's performance is shown through simulations and a real-world application to data on major depressive and generalized anxiety disorders from the National Epidemiological Survey on Alcohol and Related Conditions (NESARC).

Ising 模型已成为分析项目反应数据的常用心理测量模型。伊辛模型的统计推断通常通过伪似然法进行,因为当变量(即项目)较多时,标准似然法的计算成本较高。遗憾的是,缺失值的存在会阻碍伪似然法的使用,而列表删除法处理缺失数据可能会给估计带来很大偏差,有时还会产生误导性解释。本文提出了一种用于缺失数据 Ising 网络分析的条件贝叶斯框架,该框架将伪似然法与迭代数据估算相结合。该方法建立了渐近理论。此外,还提出了一种计算高效的 Pólya-Gamma 数据扩增程序,以简化模型参数的采样。该方法的性能通过模拟和在真实世界中对全国酒精及相关疾病流行病学调查(NESARC)的重度抑郁症和广泛性焦虑症数据的应用得到了证明。
{"title":"A Note on Ising Network Analysis with Missing Data.","authors":"Siliang Zhang, Yunxiao Chen","doi":"10.1007/s11336-024-09985-2","DOIUrl":"10.1007/s11336-024-09985-2","url":null,"abstract":"<p><p>The Ising model has become a popular psychometric model for analyzing item response data. The statistical inference of the Ising model is typically carried out via a pseudo-likelihood, as the standard likelihood approach suffers from a high computational cost when there are many variables (i.e., items). Unfortunately, the presence of missing values can hinder the use of pseudo-likelihood, and a listwise deletion approach for missing data treatment may introduce a substantial bias into the estimation and sometimes yield misleading interpretations. This paper proposes a conditional Bayesian framework for Ising network analysis with missing data, which integrates a pseudo-likelihood approach with iterative data imputation. An asymptotic theory is established for the method. Furthermore, a computationally efficient Pólya-Gamma data augmentation procedure is proposed to streamline the sampling of model parameters. The method's performance is shown through simulations and a real-world application to data on major depressive and generalized anxiety disorders from the National Epidemiological Survey on Alcohol and Related Conditions (NESARC).</p>","PeriodicalId":54534,"journal":{"name":"Psychometrika","volume":" ","pages":"1186-1202"},"PeriodicalIF":2.9,"publicationDate":"2024-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11582142/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141545557","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Practical Implications of Sum Scores Being Psychometrics' Greatest Accomplishment. 总分是心理测量学最大成就的实际意义。
IF 2.9 2区 心理学 Q1 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS Pub Date : 2024-12-01 Epub Date: 2024-07-20 DOI: 10.1007/s11336-024-09988-z
Daniel McNeish

This paper reflects on some practical implications of the excellent treatment of sum scoring and classical test theory (CTT) by Sijtsma et al. (Psychometrika 89(1):84-117, 2024). I have no major disagreements about the content they present and found it to be an informative clarification of the properties and possible extensions of CTT. In this paper, I focus on whether sum scores-despite their mathematical justification-are positioned to improve psychometric practice in empirical studies in psychology, education, and adjacent areas. First, I summarize recent reviews of psychometric practice in empirical studies, subsequent calls for greater psychometric transparency and validity, and how sum scores may or may not be positioned to adhere to such calls. Second, I consider limitations of sum scores for prediction, especially in the presence of common features like ordinal or Likert response scales, multidimensional constructs, and moderated or heterogeneous associations. Third, I review previous research outlining potential limitations of using sum scores as outcomes in subsequent analyses where rank ordering is not always sufficient to successfully characterize group differences or change over time. Fourth, I cover potential challenges for providing validity evidence for whether sum scores represent a single construct, particularly if one wishes to maintain minimal CTT assumptions. I conclude with thoughts about whether sum scores-even if mathematically justified-are positioned to improve psychometric practice in empirical studies.

本文对 Sijtsma 等人关于总分法和经典测验理论(CTT)的精彩论述(Psychometrika 89(1):84-117, 2024)的一些实际意义进行了反思。我对他们介绍的内容没有太大异议,并认为他们对 CTT 的特性和可能的扩展进行了翔实的说明。在本文中,我将重点讨论总分--尽管有其数学上的合理性--在心理学、教育学及邻近领域的实证研究中是否能改善心理测量实践。首先,我总结了最近对实证研究中心理测量实践的评论、随后对提高心理测量透明度和有效性的呼吁,以及总和分数是如何或可能无法满足这些呼吁的。其次,我考虑了总分在预测方面的局限性,尤其是在存在一些共同特征的情况下,如序数或李克特反应量表、多维建构以及缓和或异质关联。第三,我回顾了以往的研究,概述了在后续分析中使用总分作为结果的潜在局限性,在这些分析中,等级排序并不总是足以成功描述群体差异或随时间的变化。第四,我将介绍为总分是否代表单一建构提供有效性证据所面临的潜在挑战,尤其是在希望维持最低 CTT 假设的情况下。最后,我将对总分--即使在数学上是合理的--是否能改善实证研究中的心理测量实践进行思考。
{"title":"Practical Implications of Sum Scores Being Psychometrics' Greatest Accomplishment.","authors":"Daniel McNeish","doi":"10.1007/s11336-024-09988-z","DOIUrl":"10.1007/s11336-024-09988-z","url":null,"abstract":"<p><p>This paper reflects on some practical implications of the excellent treatment of sum scoring and classical test theory (CTT) by Sijtsma et al. (Psychometrika 89(1):84-117, 2024). I have no major disagreements about the content they present and found it to be an informative clarification of the properties and possible extensions of CTT. In this paper, I focus on whether sum scores-despite their mathematical justification-are positioned to improve psychometric practice in empirical studies in psychology, education, and adjacent areas. First, I summarize recent reviews of psychometric practice in empirical studies, subsequent calls for greater psychometric transparency and validity, and how sum scores may or may not be positioned to adhere to such calls. Second, I consider limitations of sum scores for prediction, especially in the presence of common features like ordinal or Likert response scales, multidimensional constructs, and moderated or heterogeneous associations. Third, I review previous research outlining potential limitations of using sum scores as outcomes in subsequent analyses where rank ordering is not always sufficient to successfully characterize group differences or change over time. Fourth, I cover potential challenges for providing validity evidence for whether sum scores represent a single construct, particularly if one wishes to maintain minimal CTT assumptions. I conclude with thoughts about whether sum scores-even if mathematically justified-are positioned to improve psychometric practice in empirical studies.</p>","PeriodicalId":54534,"journal":{"name":"Psychometrika","volume":" ","pages":"1148-1169"},"PeriodicalIF":2.9,"publicationDate":"2024-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141731649","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Bayesian Adaptive Lasso for Detecting Item-Trait Relationship and Differential Item Functioning in Multidimensional Item Response Theory Models. 贝叶斯自适应套索用于检测多维项目反应理论模型中的项目-特质关系和差异项目功能。
IF 2.9 2区 心理学 Q1 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS Pub Date : 2024-12-01 Epub Date: 2024-08-10 DOI: 10.1007/s11336-024-09998-x
Na Shan, Ping-Feng Xu

In multidimensional tests, the identification of latent traits measured by each item is crucial. In addition to item-trait relationship, differential item functioning (DIF) is routinely evaluated to ensure valid comparison among different groups. The two problems are investigated separately in the literature. This paper uses a unified framework for detecting item-trait relationship and DIF in multidimensional item response theory (MIRT) models. By incorporating DIF effects in MIRT models, these problems can be considered as variable selection for latent/observed variables and their interactions. A Bayesian adaptive Lasso procedure is developed for variable selection, in which item-trait relationship and DIF effects can be obtained simultaneously. Simulation studies show the performance of our method for parameter estimation, the recovery of item-trait relationship and the detection of DIF effects. An application is presented using data from the Eysenck Personality Questionnaire.

在多维测试中,确定每个项目所测量的潜在特质至关重要。除了项目与特质的关系外,还需要对差异项目功能(DIF)进行常规评估,以确保不同组间的有效比较。文献中对这两个问题分别进行了研究。本文使用一个统一的框架来检测多维项目反应理论(MIRT)模型中的项目-特质关系和 DIF。通过将 DIF 效应纳入 MIRT 模型,这些问题可被视为潜变量/观测变量及其交互作用的变量选择。我们开发了一种贝叶斯自适应 Lasso 程序用于变量选择,该程序可同时获得项目-特质关系和 DIF 效应。模拟研究显示了我们的方法在参数估计、恢复项目-特质关系和检测 DIF 效应方面的性能。我们还介绍了艾森克人格问卷数据的应用。
{"title":"Bayesian Adaptive Lasso for Detecting Item-Trait Relationship and Differential Item Functioning in Multidimensional Item Response Theory Models.","authors":"Na Shan, Ping-Feng Xu","doi":"10.1007/s11336-024-09998-x","DOIUrl":"10.1007/s11336-024-09998-x","url":null,"abstract":"<p><p>In multidimensional tests, the identification of latent traits measured by each item is crucial. In addition to item-trait relationship, differential item functioning (DIF) is routinely evaluated to ensure valid comparison among different groups. The two problems are investigated separately in the literature. This paper uses a unified framework for detecting item-trait relationship and DIF in multidimensional item response theory (MIRT) models. By incorporating DIF effects in MIRT models, these problems can be considered as variable selection for latent/observed variables and their interactions. A Bayesian adaptive Lasso procedure is developed for variable selection, in which item-trait relationship and DIF effects can be obtained simultaneously. Simulation studies show the performance of our method for parameter estimation, the recovery of item-trait relationship and the detection of DIF effects. An application is presented using data from the Eysenck Personality Questionnaire.</p>","PeriodicalId":54534,"journal":{"name":"Psychometrika","volume":" ","pages":"1337-1365"},"PeriodicalIF":2.9,"publicationDate":"2024-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141914581","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Modeling Evasive Response Bias in Randomized Response: Cheater Detection Versus Self-protective No-Saying. 随机应答中的回避应答偏差建模:作弊者检测与自我保护性 "不说话"。
IF 2.9 2区 心理学 Q1 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS Pub Date : 2024-12-01 Epub Date: 2024-08-30 DOI: 10.1007/s11336-024-10000-x
Khadiga H A Sayed, Maarten J L F Cruyff, Peter G M van der Heijden

Randomized response is an interview technique for sensitive questions designed to eliminate evasive response bias. Since this elimination is only partially successful, two models have been proposed for modeling evasive response bias: the cheater detection model for a design with two sub-samples with different randomization probabilities and the self-protective no sayers model for a design with multiple sensitive questions. This paper shows the correspondence between these models, and introduces models for the new, hybrid "ever/last year" design that account for self-protective no saying and cheating. The model for one set of ever/last year questions has a degree of freedom that can be used for the inclusion of a response bias parameter. Models with multiple degrees of freedom are introduced for extensions of the design with a third randomized response question and a second set of ever/last year questions. The models are illustrated with two surveys on doping use. We conclude with a discussion of the pros and cons of the ever/last year design and its potential for future research.

随机回答是一种针对敏感问题的访谈技术,旨在消除回避回答偏差。由于这种消除方法只取得了部分成功,因此提出了两个模型来模拟回避回答偏差:针对具有不同随机化概率的两个子样本的设计的作弊者检测模型,以及针对具有多个敏感问题的设计的自我保护不说模型。本文展示了这些模型之间的对应关系,并介绍了新的、混合的 "曾经/最后一年 "设计模型,这些模型考虑到了自我保护性不说模型和作弊模型。一组 "曾经/最后一年 "问题的模型有一个自由度,可用于加入一个反应偏差参数。多自由度模型适用于该设计的扩展,包括第三个随机回答问题和第二组曾经/最后一年问题。我们用两个关于兴奋剂使用情况的调查来说明这些模型。最后,我们讨论了 "曾经/最后一年 "设计的利弊及其在未来研究中的潜力。
{"title":"Modeling Evasive Response Bias in Randomized Response: Cheater Detection Versus Self-protective No-Saying.","authors":"Khadiga H A Sayed, Maarten J L F Cruyff, Peter G M van der Heijden","doi":"10.1007/s11336-024-10000-x","DOIUrl":"10.1007/s11336-024-10000-x","url":null,"abstract":"<p><p>Randomized response is an interview technique for sensitive questions designed to eliminate evasive response bias. Since this elimination is only partially successful, two models have been proposed for modeling evasive response bias: the cheater detection model for a design with two sub-samples with different randomization probabilities and the self-protective no sayers model for a design with multiple sensitive questions. This paper shows the correspondence between these models, and introduces models for the new, hybrid \"ever/last year\" design that account for self-protective no saying and cheating. The model for one set of ever/last year questions has a degree of freedom that can be used for the inclusion of a response bias parameter. Models with multiple degrees of freedom are introduced for extensions of the design with a third randomized response question and a second set of ever/last year questions. The models are illustrated with two surveys on doping use. We conclude with a discussion of the pros and cons of the ever/last year design and its potential for future research.</p>","PeriodicalId":54534,"journal":{"name":"Psychometrika","volume":" ","pages":"1261-1279"},"PeriodicalIF":2.9,"publicationDate":"2024-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11582306/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142114830","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Asymptotically Correct Person Fit z-Statistics For the Rasch Testlet Model. Rasch 小测验模型的渐近正确人员拟合 z 统计量。
IF 2.9 2区 心理学 Q1 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS Pub Date : 2024-12-01 Epub Date: 2024-08-17 DOI: 10.1007/s11336-024-09997-y
Zhongtian Lin, Tao Jiang, Frank Rijmen, Paul Van Wamelen

A well-known person fit statistic in the item response theory (IRT) literature is the l z statistic (Drasgow et al. in Br J Math Stat Psychol 38(1):67-86, 1985). Snijders (Psychometrika 66(3):331-342, 2001) derived l z , which is the asymptotically correct version of l z when the ability parameter is estimated. However, both statistics and other extensions later developed concern either only the unidimensional IRT models or multidimensional models that require a joint estimate of latent traits across all the dimensions. Considering a marginalized maximum likelihood ability estimator, this paper proposes l zt and l zt , which are extensions of l z and l z , respectively, for the Rasch testlet model. The computation of l zt relies on several extensions of the Lord-Wingersky algorithm (1984) that are additional contributions of this paper. Simulation results show that l zt has close-to-nominal Type I error rates and satisfactory power for detecting aberrant responses. For unidimensional models, l zt and l zt reduce to l z and l z , respectively, and therefore allows for the evaluation of person fit with a wider range of IRT models. A real data application is presented to show the utility of the proposed statistics for a test with an underlying structure that consists of both the traditional unidimensional component and the Rasch testlet component.

在项目反应理论(IRT)文献中,一个著名的拟合统计量是 l z 统计量(Drasgow 等人,载于 Br J Math Stat Psychol 38(1):67-86,1985 年)。Snijders(Psychometrika 66(3):331-342,2001)推导出了 l z ∗,这是能力参数估计时 l z 的渐近正确版本。然而,这两个统计量和后来开发的其他扩展都只涉及单维 IRT 模型或多维模型,后者需要对所有维度的潜在特质进行联合估计。考虑到边际最大似然能力估计器,本文提出了 l zt 和 l zt ∗,它们分别是 l z 和 l z ∗ 的扩展,适用于 Rasch 小测验模型。l zt ∗ 的计算依赖于 Lord-Wingersky 算法(1984 年)的几个扩展,这是本文的额外贡献。模拟结果表明,l zt ∗ 具有接近正常的 I 类错误率和令人满意的异常反应检测能力。对于单维模型,l zt 和 l zt ∗ 分别简化为 l z 和 l z ∗,因此可以对更广泛的 IRT 模型进行拟合评估。本文介绍了一个真实的数据应用,以展示所提出的统计方法在一个测试中的实用性,该测试的基本结构由传统的单维部分和 Rasch 小测试部分组成。
{"title":"Asymptotically Correct Person Fit z-Statistics For the Rasch Testlet Model.","authors":"Zhongtian Lin, Tao Jiang, Frank Rijmen, Paul Van Wamelen","doi":"10.1007/s11336-024-09997-y","DOIUrl":"10.1007/s11336-024-09997-y","url":null,"abstract":"<p><p>A well-known person fit statistic in the item response theory (IRT) literature is the <math><msub><mi>l</mi> <mi>z</mi></msub> </math> statistic (Drasgow et al. in Br J Math Stat Psychol 38(1):67-86, 1985). Snijders (Psychometrika 66(3):331-342, 2001) derived <math><mmultiscripts><mi>l</mi> <mrow><mi>z</mi></mrow> <mrow><mrow></mrow> <mo>∗</mo></mrow> </mmultiscripts> </math> , which is the asymptotically correct version of <math><msub><mi>l</mi> <mi>z</mi></msub> </math> when the ability parameter is estimated. However, both statistics and other extensions later developed concern either only the unidimensional IRT models or multidimensional models that require a joint estimate of latent traits across all the dimensions. Considering a marginalized maximum likelihood ability estimator, this paper proposes <math><msub><mi>l</mi> <mrow><mi>zt</mi></mrow> </msub> </math> and <math><mmultiscripts><mi>l</mi> <mrow><mi>zt</mi></mrow> <mrow><mrow></mrow> <mo>∗</mo></mrow> </mmultiscripts> </math> , which are extensions of <math><msub><mi>l</mi> <mi>z</mi></msub> </math> and <math><mmultiscripts><mi>l</mi> <mrow><mi>z</mi></mrow> <mrow><mrow></mrow> <mo>∗</mo></mrow> </mmultiscripts> </math> , respectively, for the Rasch testlet model. The computation of <math><mmultiscripts><mi>l</mi> <mrow><mi>zt</mi></mrow> <mrow><mrow></mrow> <mo>∗</mo></mrow> </mmultiscripts> </math> relies on several extensions of the Lord-Wingersky algorithm (1984) that are additional contributions of this paper. Simulation results show that <math><mmultiscripts><mi>l</mi> <mrow><mi>zt</mi></mrow> <mrow><mrow></mrow> <mo>∗</mo></mrow> </mmultiscripts> </math> has close-to-nominal Type I error rates and satisfactory power for detecting aberrant responses. For unidimensional models, <math><msub><mi>l</mi> <mrow><mi>zt</mi></mrow> </msub> </math> and <math><mmultiscripts><mi>l</mi> <mrow><mi>zt</mi></mrow> <mrow><mrow></mrow> <mo>∗</mo></mrow> </mmultiscripts> </math> reduce to <math><msub><mi>l</mi> <mi>z</mi></msub> </math> and <math><mmultiscripts><mi>l</mi> <mrow><mi>z</mi></mrow> <mrow><mrow></mrow> <mo>∗</mo></mrow> </mmultiscripts> </math> , respectively, and therefore allows for the evaluation of person fit with a wider range of IRT models. A real data application is presented to show the utility of the proposed statistics for a test with an underlying structure that consists of both the traditional unidimensional component and the Rasch testlet component.</p>","PeriodicalId":54534,"journal":{"name":"Psychometrika","volume":" ","pages":"1230-1260"},"PeriodicalIF":2.9,"publicationDate":"2024-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141996955","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Psychometrika
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1