Psychometrika最新文献

英文中文

Psychometric Model Framework for Multiple Response Items. 多反应项目心理测量模型框架。

IF 3.1 2区心理学 Q1 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS

Psychometrika

Pub Date : 2025-12-19 DOI: 10.1017/psy.2025.10073

Wenjie Zhou, Lei Guo

Multiple response (MR) items-such as multiple true-false, multiple-select, and select-N items-are increasingly used in assessments to identify partial knowledge and differentiate latent abilities more accurately. Allowing multiple selections, MR items provide richer information and reduce guessing effects compared to single-answer multiple-choice items. However, traditional scoring methods (e.g., Dichotomous, Ripkey, Partial scoring) compress response combination (RC) data, losing valuable information and ignoring issues like local dependence and incompatibility across item types. To address these challenges, we introduce a novel psychometric model framework: the Multiple Response Model with Inter-option Local Dependencies (MRM-LD), and its simplified version, the Multiple Response Model (MRM). These models preserve RC data across MR item types, offering a more comprehensive understanding for MR assessment. Parameters for MRM-LD and MRM were estimated using Markov chain Monte Carlo algorithms in Stan and R. Empirical data from an eighth-grade physics test showed that MRM-LD and MRM outperform Graded Response Model and Nominal Response Model combined with three scoring methods, by retaining more test information, improving reliability and validity, and providing more detailed analysis of item characteristics. Simulation studies confirmed the proposed models perform robustly under various conditions, including small samples and few items, demonstrating their applicability across diverse testing scenarios.

多重反应（MR）项目——如多个真假、多重选择和选择n项目——越来越多地用于评估中，以更准确地识别部分知识和区分潜在能力。与单答案的多项选择题相比，MR项目允许多次选择，提供更丰富的信息，减少猜测效果。然而，传统的评分方法（如二分法、Ripkey、部分评分）压缩了响应组合（RC）数据，丢失了有价值的信息，忽略了项目类型之间的局部依赖性和不兼容性等问题。为了解决这些挑战，我们引入了一种新的心理测量模型框架：具有选项间局部依赖关系的多重反应模型（MRM- ld）及其简化版本，多重反应模型（MRM）。这些模型保存了跨MR项目类型的RC数据，为MR评估提供了更全面的理解。采用马尔可夫链蒙特卡罗算法对MRM- ld和MRM的参数进行了估计。一项八年级物理测试的实证数据表明，MRM- ld和MRM在保留了更多的测试信息、提高了信度和效度、提供了更详细的项目特征分析等方面优于分级反应模型和标称反应模型结合三种评分方法。仿真研究证实了所提出的模型在各种条件下的鲁棒性，包括小样本和少量项目，证明了它们在不同测试场景中的适用性。

{"title":"Psychometric Model Framework for Multiple Response Items.","authors":"Wenjie Zhou, Lei Guo","doi":"10.1017/psy.2025.10073","DOIUrl":"10.1017/psy.2025.10073","url":null,"abstract":"Multiple response (MR) items-such as multiple true-false, multiple-select, and select-N items-are increasingly used in assessments to identify partial knowledge and differentiate latent abilities more accurately. Allowing multiple selections, MR items provide richer information and reduce guessing effects compared to single-answer multiple-choice items. However, traditional scoring methods (e.g., Dichotomous, Ripkey, Partial scoring) compress response combination (RC) data, losing valuable information and ignoring issues like local dependence and incompatibility across item types. To address these challenges, we introduce a novel psychometric model framework: the Multiple Response Model with Inter-option Local Dependencies (MRM-LD), and its simplified version, the Multiple Response Model (MRM). These models preserve RC data across MR item types, offering a more comprehensive understanding for MR assessment. Parameters for MRM-LD and MRM were estimated using Markov chain Monte Carlo algorithms in Stan and R. Empirical data from an eighth-grade physics test showed that MRM-LD and MRM outperform Graded Response Model and Nominal Response Model combined with three scoring methods, by retaining more test information, improving reliability and validity, and providing more detailed analysis of item characteristics. Simulation studies confirmed the proposed models perform robustly under various conditions, including small samples and few items, demonstrating their applicability across diverse testing scenarios.","PeriodicalId":54534,"journal":{"name":"Psychometrika","volume":" ","pages":"1-33"},"PeriodicalIF":3.1,"publicationDate":"2025-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145783707","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Reducing Differential Item Functioning via Process Data. 通过过程数据减少差异项目功能。

IF 3.1 2区心理学 Q1 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS

Psychometrika

Pub Date : 2025-12-10 DOI: 10.1017/psy.2025.10072

Ling Chen, Susu Zhang, Jingchen Liu

Test fairness is a major concern in psychometric and educational research. A typical approach for ensuring test fairness is through differential item functioning (DIF) analysis. DIF arises when a test item functions differently across subgroups that are typically defined by the respondents' demographic characteristics. Most of the existing research focuses on the statistical detection of DIF, yet less attention has been given to reducing or eliminating DIF. Simultaneously, the use of computer-based assessments has become increasingly popular. The data obtained from respondents interacting with an item are recorded in computer log files and are referred to as process data. In this article, we propose a novel method within the framework of generalized linear models that leverages process data to reduce and understand DIF. Specifically, we construct a nuisance trait surrogate with the features extracted from process data. With the constructed nuisance trait, we introduce a new scoring rule that incorporates respondents' behaviors captured through process data on top of the target latent trait. We demonstrate the efficiency of our approach through extensive simulation experiments and an application to 13 Problem Solving in Technology-Rich Environments items from the 2012 Programme for the International Assessment of Adult Competencies assessment.

考试公平是心理测量学和教育研究的一个主要问题。确保测试公平性的一个典型方法是通过差异项目功能（DIF）分析。当测试项目在通常由应答者的人口统计特征定义的子组之间的功能不同时，就会出现DIF。现有的研究大多集中在DIF的统计检测上，而对减少或消除DIF的关注较少。与此同时，以计算机为基础的评估也越来越受欢迎。从与项目交互的应答者获得的数据被记录在计算机日志文件中，并被称为过程数据。在本文中，我们提出了一种在广义线性模型框架内利用过程数据来减少和理解DIF的新方法。具体来说，我们用从过程数据中提取的特征构建了一个讨厌的特征代理。通过构建的讨厌特质，我们引入了一种新的评分规则，该规则将通过过程数据捕获的被调查者的行为纳入目标潜在特质之上。我们通过广泛的模拟实验和应用于2012年国际成人能力评估评估计划中的13个技术丰富环境中的问题解决方案来证明我们方法的有效性。

{"title":"Reducing Differential Item Functioning via Process Data.","authors":"Ling Chen, Susu Zhang, Jingchen Liu","doi":"10.1017/psy.2025.10072","DOIUrl":"10.1017/psy.2025.10072","url":null,"abstract":"Test fairness is a major concern in psychometric and educational research. A typical approach for ensuring test fairness is through differential item functioning (DIF) analysis. DIF arises when a test item functions differently across subgroups that are typically defined by the respondents' demographic characteristics. Most of the existing research focuses on the statistical detection of DIF, yet less attention has been given to reducing or eliminating DIF. Simultaneously, the use of computer-based assessments has become increasingly popular. The data obtained from respondents interacting with an item are recorded in computer log files and are referred to as process data. In this article, we propose a novel method within the framework of generalized linear models that leverages process data to reduce and understand DIF. Specifically, we construct a nuisance trait surrogate with the features extracted from process data. With the constructed nuisance trait, we introduce a new scoring rule that incorporates respondents' behaviors captured through process data on top of the target latent trait. We demonstrate the efficiency of our approach through extensive simulation experiments and an application to 13 Problem Solving in Technology-Rich Environments items from the 2012 Programme for the International Assessment of Adult Competencies assessment.","PeriodicalId":54534,"journal":{"name":"Psychometrika","volume":" ","pages":"1-36"},"PeriodicalIF":3.1,"publicationDate":"2025-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145716679","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Bayesian Selection Policies for Human-in-the-Loop Anomaly Detectors with Applications in Test Security. 人在环异常检测器的贝叶斯选择策略及其在测试安全中的应用。

IF 3.1 2区心理学 Q1 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS

Psychometrika

Pub Date : 2025-12-10 DOI: 10.1017/psy.2025.10056

Michael Fauss, Xiang Liu, Chen Li, Ikkyu Choi, H Vincent Poor

This article investigates the problem of automatically flagging test takers who exhibit atypical responses or behaviors for further review by human experts. The objective is to develop a selection policy that maximizes the expected number of test takers correctly identified as warranting additional scrutiny while maintaining a manageable volume of reviews per test administration. The selection procedure should learn from the outcomes of the expert reviews. Since typically only a fraction of test takers are reviewed, this leads to a semi-supervised learning problem. The latter is formalized in a Bayesian setting, and the corresponding optimal selection policy is derived. Since calculating the policy and the underlying posterior distributions is computationally infeasible, a variational approximation and three heuristic selection policies are proposed. These policies are informed by properties of the optimal policy and correspond to different exploration/exploitation trade-offs. The performance of the approximate policies is assessed via numerical experiments using both synthetic and real-world data and is compared with procedures based on off-the-shelf algorithms as well as theoretical performance bounds.

本文研究了自动标记表现出非典型反应或行为的考生的问题，以供人类专家进一步审查。我们的目标是制定一项选拔政策，以最大限度地增加被正确识别为需要额外审查的考生的预期数量，同时保持每次考试管理的可管理的审查量。甄选程序应借鉴专家评审的结果。由于通常只有一小部分考生被审查，这导致了半监督学习问题。将后者在贝叶斯环境中形式化，并推导出相应的最优选择策略。由于计算策略和潜在的后验分布在计算上是不可行的，因此提出了一种变分逼近和三种启发式选择策略。这些策略由最优策略的属性通知，并对应于不同的勘探/开发权衡。近似策略的性能通过使用合成和实际数据的数值实验进行评估，并与基于现成算法和理论性能界限的过程进行比较。

{"title":"Bayesian Selection Policies for Human-in-the-Loop Anomaly Detectors with Applications in Test Security.","authors":"Michael Fauss, Xiang Liu, Chen Li, Ikkyu Choi, H Vincent Poor","doi":"10.1017/psy.2025.10056","DOIUrl":"10.1017/psy.2025.10056","url":null,"abstract":"This article investigates the problem of automatically flagging test takers who exhibit atypical responses or behaviors for further review by human experts. The objective is to develop a selection policy that maximizes the expected number of test takers correctly identified as warranting additional scrutiny while maintaining a manageable volume of reviews per test administration. The selection procedure should learn from the outcomes of the expert reviews. Since typically only a fraction of test takers are reviewed, this leads to a semi-supervised learning problem. The latter is formalized in a Bayesian setting, and the corresponding optimal selection policy is derived. Since calculating the policy and the underlying posterior distributions is computationally infeasible, a variational approximation and three heuristic selection policies are proposed. These policies are informed by properties of the optimal policy and correspond to different exploration/exploitation trade-offs. The performance of the approximate policies is assessed via numerical experiments using both synthetic and real-world data and is compared with procedures based on off-the-shelf algorithms as well as theoretical performance bounds.","PeriodicalId":54534,"journal":{"name":"Psychometrika","volume":" ","pages":"1-33"},"PeriodicalIF":3.1,"publicationDate":"2025-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145716767","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

SELF-Tree: An Interpretable Model for Multivariate Causal Direction Heterogeneity Analysis. SELF-Tree：多变量因果方向异质性分析的可解释模型。

IF 3.1 2区心理学 Q1 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS

Psychometrika

Pub Date : 2025-12-10 DOI: 10.1017/psy.2025.10067

Zhifei Li, Hongbo Wen

Identifying causal directions among variables via data-driven approaches is a research hotspot. Researchers now focus on detecting causal direction heterogeneity among multiple variables (variables more than two) when covariates cause such heterogeneity. This study combines the structural equation likelihood function (SELF) method with a recursive partitioning method to achieve an interpretable model of multivariate causal direction heterogeneity in multivariable settings. Through simulation, we compared the performance of the SELF-Tree model in terms of the identification about heterogeneous causal direction under different conditions. Using a public drug consumption dataset, we demonstrated its real data application. The SELF-Tree model offers researchers a new way to understand variable causal direction heterogeneity.

利用数据驱动方法识别变量之间的因果关系是一个研究热点。当协变量导致这种异质性时，研究人员现在关注的是在多个变量（两个以上的变量）之间检测因果方向异质性。本研究将结构方程似然函数（SELF）方法与递归划分方法相结合，实现了多变量设置下多元因果方向异质性的可解释模型。通过仿真，比较了SELF-Tree模型在不同条件下对异质性因果方向的识别性能。我们使用一个公开的药物消费数据集，演示了它的实际数据应用。SELF-Tree模型为研究人员提供了一种理解变量因果方向异质性的新途径。

引用次数: 0

The bit scale: A metric score scale for unidimensional item response theory models. 位量表：一种用于一维项目反应理论模型的度量计分量表。

IF 3.1 2区心理学 Q1 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS

Psychometrika

Pub Date : 2025-12-10 DOI: 10.1017/psy.2025.10071

Joakim Wallmark, Marie Wiberg

引用次数: 0

Bayesian Joint Modeling of Response Times with Dynamic Latent Ability in Educational Testing. 教育测试中反应时间与动态潜在能力的贝叶斯联合建模。

IF 3.1 2区心理学 Q1 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS

Psychometrika

Pub Date : 2025-12-02 DOI: 10.1017/psy.2025.10019

Xiaojing Wang, Abhisek Saha, Dipak K Dey

In educational testing, inferences of ability have been mainly based on item responses, while the time taken to complete an item is often ignored. To better infer the ability, a new class of state space models, which conjointly model response time with time series of dichotomous responses, is developed. Simulations for the proposed models demonstrate that the biases of ability estimation are reduced as well as the precisions of ability estimation are improved. An empirical study is conducted using EdSphere datasets, where the two competing relationships (i.e., monotone and inverted U-shape) for the distance between ability and difficulty are investigated in modeling response times. The results of model comparison support that the inverted U-shape relationship better captures the behaviors and psychology of examinees in exams for EdSphere datasets.

在教育测试中，能力的推断主要基于项目的反应，而完成一个项目所花费的时间往往被忽视。为了更好地推断这种能力，提出了一种新的状态空间模型，该模型将响应时间与二分类响应时间序列联合建模。仿真结果表明，所提模型减小了能力估计的偏差，提高了能力估计的精度。使用EdSphere数据集进行了实证研究，研究了建模响应时间中能力和难度之间的两种竞争关系（即单调和倒u形）。模型比较结果支持倒u型关系更好地反映了EdSphere数据集考生的考试行为和心理。

引用次数: 0

Robust Estimation of Polychoric Correlation. 多频相关的稳健估计。

IF 3.1 2区心理学 Q1 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS

Psychometrika

Pub Date : 2025-12-01 DOI: 10.1017/psy.2025.10066

Max Welz, Patrick Mair, Andreas Alfons

Polychoric correlation is often an important building block in the analysis of rating data, particularly for structural equation models. However, the commonly employed maximum likelihood (ML) estimator is highly susceptible to misspecification of the polychoric correlation model, for instance, through violations of latent normality assumptions. We propose a novel estimator that is designed to be robust against partial misspecification of the polychoric model, that is, when the model is misspecified for an unknown fraction of observations, such as careless respondents. To this end, the estimator minimizes a robust loss function based on the divergence between observed frequencies and theoretical frequencies implied by the polychoric model. In contrast to existing literature, our estimator makes no assumption on the type or degree of model misspecification. It furthermore generalizes ML estimation, is consistent as well as asymptotically normally distributed, and comes at no additional computational cost. We demonstrate the robustness and practical usefulness of our estimator in simulation studies and an empirical application on a Big Five administration. In the latter, the polychoric correlation estimates of our estimator and ML differ substantially, which, after further inspection, is likely due to the presence of careless respondents that the estimator helps identify.

在评级数据分析中，尤其是结构方程模型中，多重相关往往是一个重要的组成部分。然而，通常使用的最大似然（ML）估计器极易受到多重相关模型的错误规范的影响，例如，通过违反潜在正态性假设。我们提出了一种新的估计器，该估计器被设计为对多元模型的部分错误规范具有鲁棒性，也就是说，当模型被错误指定为未知部分的观测值时，例如粗心的应答者。为此，估计器根据观测频率和多共频模型隐含的理论频率之间的差异最小化鲁棒损失函数。与现有文献相反，我们的估计器没有对模型规格错误的类型或程度做任何假设。它进一步推广了ML估计，是一致的，也是渐近正态分布的，并且不需要额外的计算成本。我们在模拟研究和五大管理的经验应用中证明了我们的估计器的鲁棒性和实用性。在后一种情况下，我们的估计器和ML的多重相关性估计有很大的不同，在进一步检查之后，这可能是由于估计器帮助识别的粗心的应答者的存在。

{"title":"Robust Estimation of Polychoric Correlation.","authors":"Max Welz, Patrick Mair, Andreas Alfons","doi":"10.1017/psy.2025.10066","DOIUrl":"10.1017/psy.2025.10066","url":null,"abstract":"Polychoric correlation is often an important building block in the analysis of rating data, particularly for structural equation models. However, the commonly employed maximum likelihood (ML) estimator is highly susceptible to misspecification of the polychoric correlation model, for instance, through violations of latent normality assumptions. We propose a novel estimator that is designed to be robust against partial misspecification of the polychoric model, that is, when the model is misspecified for an unknown fraction of observations, such as careless respondents. To this end, the estimator minimizes a robust loss function based on the divergence between observed frequencies and theoretical frequencies implied by the polychoric model. In contrast to existing literature, our estimator makes no assumption on the type or degree of model misspecification. It furthermore generalizes ML estimation, is consistent as well as asymptotically normally distributed, and comes at no additional computational cost. We demonstrate the robustness and practical usefulness of our estimator in simulation studies and an empirical application on a Big Five administration. In the latter, the polychoric correlation estimates of our estimator and ML differ substantially, which, after further inspection, is likely due to the presence of careless respondents that the estimator helps identify.","PeriodicalId":54534,"journal":{"name":"Psychometrika","volume":" ","pages":"1-32"},"PeriodicalIF":3.1,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145650157","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A Generalized Definition of Multidimensional Item Response Theory Parameters. 多维项目反应理论参数的广义定义。

IF 3.1 2区心理学 Q1 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS

Psychometrika

Pub Date : 2025-11-19 DOI: 10.1017/psy.2025.10063

Daniel Morillo-Cuadrado, Mario Luzardo-Verde

In this paper, we generalize the multidimensional discrimination and difficulty parameters in the multidimensional two-parameter logistic model to account for nonidentity latent covariances and negatively keyed items. We apply Reckase's maximum discrimination point method to define them in an arbitrary algebraic basis. Then, we define that basis to be a geometrical representation of the measured construct. This results in three different versions of the parameters: the original one, based on the item parameters solely; one that incorporates the covariance structure of the latent space; and one that uses the correlation structure instead. Importantly, we find that the items should be properly represented in a test space, distinct from the latent space. We also provide a procedure for the geometrical representation of the items in the test space and apply our results to examples from the literature to get a more accurate representation of the measurement properties of the items. We recommend using the covariance structure version for describing the properties of the parameters and the correlation structure version for graphical representation. Finally, we discuss the implications of this generalization for other multidimensional item response theory models and the parallels of our results in common factor model theory.

本文推广了多维双参数逻辑模型中的多维判别和难度参数，以解释非同一性潜在协方差和负关键字项目。我们应用Reckase的最大区别点方法在任意代数基上定义它们。然后，我们将该基定义为测量构造的几何表示。这将产生三个不同版本的参数：原始版本仅基于项目参数；一种包含潜在空间协方差结构的；另一种是使用相关结构。重要的是，我们发现项目应该在测试空间中适当地表示，与潜在空间不同。我们还提供了测试空间中项目的几何表示程序，并将我们的结果应用于文献中的例子，以获得更准确的项目测量属性表示。我们建议使用协方差结构版本来描述参数的属性，使用相关结构版本来进行图形表示。最后，我们讨论了这一推广对其他多维项目反应理论模型的影响，以及我们的结果在共同因素模型理论中的相似之处。

{"title":"A Generalized Definition of Multidimensional Item Response Theory Parameters.","authors":"Daniel Morillo-Cuadrado, Mario Luzardo-Verde","doi":"10.1017/psy.2025.10063","DOIUrl":"10.1017/psy.2025.10063","url":null,"abstract":"In this paper, we generalize the multidimensional discrimination and difficulty parameters in the multidimensional two-parameter logistic model to account for nonidentity latent covariances and negatively keyed items. We apply Reckase's maximum discrimination point method to define them in an arbitrary algebraic basis. Then, we define that basis to be a geometrical representation of the measured construct. This results in three different versions of the parameters: the original one, based on the item parameters solely; one that incorporates the covariance structure of the latent space; and one that uses the correlation structure instead. Importantly, we find that the items should be properly represented in a test space, distinct from the latent space. We also provide a procedure for the geometrical representation of the items in the test space and apply our results to examples from the literature to get a more accurate representation of the measurement properties of the items. We recommend using the covariance structure version for describing the properties of the parameters and the correlation structure version for graphical representation. Finally, we discuss the implications of this generalization for other multidimensional item response theory models and the parallels of our results in common factor model theory.","PeriodicalId":54534,"journal":{"name":"Psychometrika","volume":" ","pages":"1-23"},"PeriodicalIF":3.1,"publicationDate":"2025-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145551540","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Two Markov Solution Process Models for the Assessment of Planning in Problem Solving. 问题解决中计划评估的两个马尔可夫解过程模型。

IF 3.1 2区心理学 Q1 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS

Psychometrika

Pub Date : 2025-11-13 DOI: 10.1017/psy.2025.10042

Andrea Brancaccio, Debora de Chiusole, Ottavia M Epifania, Pasquale Anselmi, Matilde Spinoso, Noemi Mazzoni, Alice Bacherini, Matteo Orsoni, Sara Giovagnoli, Irene Pierluigi, Mariagrazia Benassi, Giulia Balboni, Luca Stefanutti

Tower tasks are popular tools used to measure planning skills. The sequences of moves undertaken by the respondents in solving tower tasks might provide important and useful information to shed light on their planning skills. The article focuses on the distinction between a situation where planning occurs before action (pre-planning) from one where planning and action are interlaced all along the execution of the task (interim-planning). While the model for pre-planning was already developed by Stefanutti et al. (2021), an alternative model for the interim-planning is proposed. The two models are compared with one another in an empirical study. In accordance with the literature on the development of planning skills, the pre-planning model better fits data collected on individuals aged 14 on, while the interim-planning model displays a better fit with data collected on individuals aged 4-8. This result is further corroborated by the analysis of the time performance.

塔式任务是衡量计划能力的常用工具。被调查者在解决塔任务时采取的行动顺序可能提供重要和有用的信息，以阐明他们的计划技能。这篇文章关注的是计划发生在行动之前的情况（预计划）和计划和行动在任务执行过程中相互交织的情况（中期计划）之间的区别。虽然Stefanutti等人（2021）已经开发了预先规划模型，但提出了一种替代的中期规划模型。在实证研究中，对两种模型进行了比较。根据有关规划技能发展的文献，预规划模型更适合14岁以上的个体数据，而中期规划模型更适合4-8岁的个体数据。对时间性能的分析进一步证实了这一结果。

{"title":"Two Markov Solution Process Models for the Assessment of Planning in Problem Solving.","authors":"Andrea Brancaccio, Debora de Chiusole, Ottavia M Epifania, Pasquale Anselmi, Matilde Spinoso, Noemi Mazzoni, Alice Bacherini, Matteo Orsoni, Sara Giovagnoli, Irene Pierluigi, Mariagrazia Benassi, Giulia Balboni, Luca Stefanutti","doi":"10.1017/psy.2025.10042","DOIUrl":"https://doi.org/10.1017/psy.2025.10042","url":null,"abstract":"Tower tasks are popular tools used to measure planning skills. The sequences of moves undertaken by the respondents in solving tower tasks might provide important and useful information to shed light on their planning skills. The article focuses on the distinction between a situation where planning occurs before action (pre-planning) from one where planning and action are interlaced all along the execution of the task (interim-planning). While the model for pre-planning was already developed by Stefanutti et al. (2021), an alternative model for the interim-planning is proposed. The two models are compared with one another in an empirical study. In accordance with the literature on the development of planning skills, the pre-planning model better fits data collected on individuals aged 14 on, while the interim-planning model displays a better fit with data collected on individuals aged 4-8. This result is further corroborated by the analysis of the time performance.","PeriodicalId":54534,"journal":{"name":"Psychometrika","volume":" ","pages":"1-31"},"PeriodicalIF":3.1,"publicationDate":"2025-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145507979","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Multidimensional Generalized Partial Preference Model for Forced-Choice Items. 强迫选择项目的多维广义部分偏好模型。

IF 3.1 2区心理学 Q1 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS

Psychometrika

Pub Date : 2025-11-13 DOI: 10.1017/psy.2025.10054

Daniel C Furr, Jianbin Fu

A ranking pattern approach is proposed to build item response theory (IRT) models for forced-choice (FC) items. This new approach is an addition to the two existing approaches, sequential selection and Thurstone's law of pairwise comparison. A new dominance IRT model, the multidimensional generalized partial preference model (MGPPM), is proposed for FC items with any number (greater than 1) of statements. The maximum marginal likelihood estimation using an expectation-maximization algorithm (MML-EM) and Markov chain Monte Carlo (MCMC) estimation are developed. A simulation study is conducted to show satisfactory parameter recovery on triplet and tetrad data. The relationships between the newly proposed approach/model and the existing approaches/models are described, and the MGPPM, Thurstonian IRT (TIRT) model, and Triplet-2PLM are compared when applied to simulated and real triplet data. The new approach offers more flexible IRT modeling than the other two approaches under different assumptions, and the MGPPM is more statistically elegant than the TIRT and Triple-2PLM.

提出了一种排序模式方法来构建强迫选择项目的项目反应理论模型。这种新方法是对现有的两种方法——顺序选择法和瑟斯通的两两比较法——的补充。本文提出了一个新的优势IRT模型——多维广义部分偏好模型（MGPPM），该模型适用于具有任意数量（大于1）语句的FC条目。提出了期望最大化算法（MML-EM）和马尔可夫链蒙特卡罗（MCMC）估计的最大边际似然估计。仿真研究表明，对三联体和四联体数据的参数恢复是满意的。描述了新提出的方法/模型与现有方法/模型之间的关系，并比较了MGPPM、Thurstonian IRT （TIRT）模型和triplet - 2plm在模拟和实际三重数据中的应用。在不同的假设下，新方法提供了比其他两种方法更灵活的IRT建模，并且MGPPM在统计上比TIRT和Triple-2PLM更优雅。

引用次数: 0

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

Psychometrika

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀