British Journal of Mathematical & Statistical Psychology最新文献

英文中文

On a general theoretical framework of reliability. 关于可靠性的一般理论框架。

IF 1.5 3区心理学 Q3 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS

British Journal of Mathematical & Statistical Psychology

Pub Date : 2024-10-15 DOI: 10.1111/bmsp.12360

Yang Liu, Jolynn Pek, Alberto Maydeu-Olivares

Reliability is an essential measure of how closely observed scores represent latent scores (reflecting constructs), assuming some latent variable measurement model. We present a general theoretical framework of reliability, placing emphasis on measuring the association between latent and observed scores. This framework was inspired by McDonald's (Psychometrika, 76, 511) regression framework, which highlighted the coefficient of determination as a measure of reliability. We extend McDonald's (Psychometrika, 76, 511) framework beyond coefficients of determination and introduce four desiderata for reliability measures (estimability, normalization, symmetry, and invariance). We also present theoretical examples to illustrate distinct measures of reliability and report on a numerical study that demonstrates the behaviour of different reliability measures. We conclude with a discussion on the use of reliability coefficients and outline future avenues of research.

信度是在假设某种潜变量测量模型的情况下，衡量观察分数与潜分数（反映建构）之间密切程度的重要指标。我们提出了一个关于信度的总体理论框架，重点是测量潜得分与观察得分之间的关联。这一框架受 McDonald（Psychometrika，76，511）回归框架的启发，该框架强调测定系数是可靠性的衡量标准。我们将 McDonald（Psychometrika, 76, 511）的框架扩展到决定系数之外，并引入了可靠性测量的四个必要条件（可估计性、正常化、对称性和不变性）。我们还列举了一些理论实例来说明不同的信度测量方法，并报告了一项数字研究，以展示不同信度测量方法的行为。最后，我们讨论了信度系数的使用，并概述了未来的研究方向。

引用次数: 0

Pairwise likelihood estimation and limited-information goodness-of-fit test statistics for binary factor analysis models under complex survey sampling. 复杂调查抽样下二元因素分析模型的成对似然估计和有限信息拟合优度检验统计。

IF 1.5 3区心理学 Q3 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS

British Journal of Mathematical & Statistical Psychology

Pub Date : 2024-10-12 DOI: 10.1111/bmsp.12358

Haziq Jamil, Irini Moustaki, Chris Skinner

This paper discusses estimation and limited-information goodness-of-fit test statistics in factor models for binary data using pairwise likelihood estimation and sampling weights. The paper extends the applicability of pairwise likelihood estimation for factor models with binary data to accommodate complex sampling designs. Additionally, it introduces two key limited-information test statistics: the Pearson chi-squared test and the Wald test. To enhance computational efficiency, the paper introduces modifications to both test statistics. The performance of the estimation and the proposed test statistics under simple random sampling and unequal probability sampling is evaluated using simulated data.

本文讨论了使用成对似然估计和抽样权重对二元数据的因子模型进行估计和有限信息拟合优度检验统计。本文扩展了成对似然估计对二元数据因子模型的适用性，以适应复杂的抽样设计。此外，论文还介绍了两个关键的有限信息检验统计量：皮尔逊卡方检验和沃尔德检验。为了提高计算效率，本文对这两个检验统计量进行了修改。本文使用模拟数据评估了在简单随机抽样和不等概率抽样条件下的估计和所提出的检验统计量的性能。

引用次数: 0

MCMC stopping rules in latent variable modelling. 潜变量建模中的 MCMC 停止规则

IF 1.5 3区心理学 Q3 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS

British Journal of Mathematical & Statistical Psychology

Pub Date : 2024-10-10 DOI: 10.1111/bmsp.12357

Sunbeom Kwon, Susu Zhang, Hans Friedrich Köhn, Bo Zhang

Bayesian analysis relies heavily on the Markov chain Monte Carlo (MCMC) algorithm to obtain random samples from posterior distributions. In this study, we compare the performance of MCMC stopping rules and provide a guideline for determining the termination point of the MCMC algorithm in latent variable models. In simulation studies, we examine the performance of four different MCMC stopping rules: potential scale reduction factor (PSRF), fixed-width stopping rule, Geweke's diagnostic, and effective sample size. Specifically, we evaluate these stopping rules in the context of the DINA model and the bifactor item response theory model, two commonly used latent variable models in educational and psychological measurement. Our simulation study findings suggest that single-chain approaches outperform multiple-chain approaches in terms of item parameter accuracy. However, when it comes to person parameter estimates, the effect of stopping rules diminishes. We caution against relying solely on the univariate PSRF, which is the most popular method, as it may terminate the algorithm prematurely and produce biased item parameter estimates if the cut-off value is not chosen carefully. Our research offers guidance to practitioners on choosing suitable stopping rules to improve the precision of the MCMC algorithm in models involving latent variables.

贝叶斯分析在很大程度上依赖于马尔科夫链蒙特卡洛（MCMC）算法来从后验分布中获取随机样本。在本研究中，我们比较了 MCMC 停止规则的性能，并为确定潜变量模型中 MCMC 算法的终止点提供了指导。在模拟研究中，我们考察了四种不同 MCMC 停止规则的性能：潜在规模缩小因子（PSRF）、固定宽度停止规则、Geweke 诊断和有效样本量。具体来说，我们在 DINA 模型和双因素项目反应理论模型（教育和心理测量中常用的两个潜变量模型）的背景下对这些停止规则进行了评估。我们的模拟研究结果表明，就项目参数准确性而言，单链方法优于多链方法。然而，当涉及到人的参数估计时，停止规则的效果就会减弱。我们提醒大家不要仅仅依赖单变量 PSRF（这是最流行的方法），因为如果不仔细选择截止值，它可能会过早终止算法，并产生有偏差的项目参数估计。我们的研究为实践者提供了指导，帮助他们选择合适的停止规则，以提高涉及潜变量模型的 MCMC 算法的精度。

{"title":"MCMC stopping rules in latent variable modelling.","authors":"Sunbeom Kwon, Susu Zhang, Hans Friedrich Köhn, Bo Zhang","doi":"10.1111/bmsp.12357","DOIUrl":"https://doi.org/10.1111/bmsp.12357","url":null,"abstract":"Bayesian analysis relies heavily on the Markov chain Monte Carlo (MCMC) algorithm to obtain random samples from posterior distributions. In this study, we compare the performance of MCMC stopping rules and provide a guideline for determining the termination point of the MCMC algorithm in latent variable models. In simulation studies, we examine the performance of four different MCMC stopping rules: potential scale reduction factor (PSRF), fixed-width stopping rule, Geweke's diagnostic, and effective sample size. Specifically, we evaluate these stopping rules in the context of the DINA model and the bifactor item response theory model, two commonly used latent variable models in educational and psychological measurement. Our simulation study findings suggest that single-chain approaches outperform multiple-chain approaches in terms of item parameter accuracy. However, when it comes to person parameter estimates, the effect of stopping rules diminishes. We caution against relying solely on the univariate PSRF, which is the most popular method, as it may terminate the algorithm prematurely and produce biased item parameter estimates if the cut-off value is not chosen carefully. Our research offers guidance to practitioners on choosing suitable stopping rules to improve the precision of the MCMC algorithm in models involving latent variables.","PeriodicalId":55322,"journal":{"name":"British Journal of Mathematical & Statistical Psychology","volume":" ","pages":""},"PeriodicalIF":1.5,"publicationDate":"2024-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142481467","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A unified EM framework for estimation and inference of normal ogive item response models. 用于估计和推断正态椭圆项目反应模型的统一 EM 框架。

IF 1.5 3区心理学 Q3 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS

British Journal of Mathematical & Statistical Psychology

Pub Date : 2024-10-10 DOI: 10.1111/bmsp.12356

Xiangbin Meng, Gongjun Xu

Normal ogive (NO) models have contributed substantially to the advancement of item response theory (IRT) and have become popular educational and psychological measurement models. However, estimating NO models remains computationally challenging. The purpose of this paper is to propose an efficient and reliable computational method for fitting NO models. Specifically, we introduce a novel and unified expectation-maximization (EM) algorithm for estimating NO models, including two-parameter, three-parameter, and four-parameter NO models. A key improvement in our EM algorithm lies in augmenting the NO model to be a complete data model within the exponential family, thereby substantially streamlining the implementation of the EM iteration and avoiding the numerical optimization computation in the M-step. Additionally, we propose a two-step expectation procedure for implementing the E-step, which reduces the dimensionality of the integration and effectively enables numerical integration. Moreover, we develop a computing procedure for estimating the standard errors (SEs) of the estimated parameters. Simulation results demonstrate the superior performance of our algorithm in terms of its recovery accuracy, robustness, and computational efficiency. To further validate our methods, we apply them to real data from the Programme for International Student Assessment (PISA). The results affirm the reliability of the parameter estimates obtained using our method.

正态椭圆（NO）模型为项目反应理论（IRT）的发展做出了巨大贡献，并已成为流行的教育和心理测量模型。然而，NO 模型的估计在计算上仍然具有挑战性。本文旨在提出一种高效可靠的拟合 NO 模型的计算方法。具体来说，我们引入了一种新颖、统一的期望最大化（EM）算法，用于估计 NO 模型，包括双参数、三参数和四参数 NO 模型。我们的 EM 算法的一个关键改进在于将 NO 模型增强为指数族中的一个完整数据模型，从而大大简化了 EM 迭代的实现，并避免了 M 步中的数值优化计算。此外，我们还提出了实施 E 步的两步期望程序，从而降低了积分的维度，并有效地实现了数值积分。此外，我们还开发了一种用于估计估计参数标准误差（SE）的计算程序。仿真结果表明，我们的算法在恢复精度、稳健性和计算效率等方面都表现出色。为了进一步验证我们的方法，我们将其应用于国际学生评估项目（PISA）的真实数据。结果肯定了使用我们的方法获得的参数估计的可靠性。

{"title":"A unified EM framework for estimation and inference of normal ogive item response models.","authors":"Xiangbin Meng, Gongjun Xu","doi":"10.1111/bmsp.12356","DOIUrl":"https://doi.org/10.1111/bmsp.12356","url":null,"abstract":"Normal ogive (NO) models have contributed substantially to the advancement of item response theory (IRT) and have become popular educational and psychological measurement models. However, estimating NO models remains computationally challenging. The purpose of this paper is to propose an efficient and reliable computational method for fitting NO models. Specifically, we introduce a novel and unified expectation-maximization (EM) algorithm for estimating NO models, including two-parameter, three-parameter, and four-parameter NO models. A key improvement in our EM algorithm lies in augmenting the NO model to be a complete data model within the exponential family, thereby substantially streamlining the implementation of the EM iteration and avoiding the numerical optimization computation in the M-step. Additionally, we propose a two-step expectation procedure for implementing the E-step, which reduces the dimensionality of the integration and effectively enables numerical integration. Moreover, we develop a computing procedure for estimating the standard errors (SEs) of the estimated parameters. Simulation results demonstrate the superior performance of our algorithm in terms of its recovery accuracy, robustness, and computational efficiency. To further validate our methods, we apply them to real data from the Programme for International Student Assessment (PISA). The results affirm the reliability of the parameter estimates obtained using our method.","PeriodicalId":55322,"journal":{"name":"British Journal of Mathematical & Statistical Psychology","volume":" ","pages":""},"PeriodicalIF":1.5,"publicationDate":"2024-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142481465","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Applying support vector machines to a diagnostic classification model for polytomous attributes in small-sample contexts. 将支持向量机应用于小样本背景下的多态属性诊断分类模型。

IF 1.5 3区心理学 Q3 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS

British Journal of Mathematical & Statistical Psychology

Pub Date : 2024-10-01 DOI: 10.1111/bmsp.12359

Xiaoyu Li, Shenghong Dong, Shaoyang Guo, Chanjin Zheng

Over several years, the evaluation of polytomous attributes in small-sample settings has posed a challenge to the application of cognitive diagnosis models. To enhance classification precision, the support vector machine (SVM) was introduced for estimating polytomous attribution, given its proven feasibility for dichotomous cases. Two simulation studies and an empirical study assessed the impact of various factors on SVM classification performance, including training sample size, attribute structures, guessing/slipping levels, number of attributes, number of attribute levels, and number of items. The results indicated that SVM outperformed the pG-DINA model in classification accuracy under dependent attribute structures and small sample sizes. SVM performance improved with an increased number of items but declined with higher guessing/slipping levels, more attributes, and more attribute levels. Empirical data further validated the application and advantages of SVMs.

几年来，在小样本环境中评估多变量属性对认知诊断模型的应用提出了挑战。为了提高分类精度，我们引入了支持向量机（SVM）来估算多变量属性，因为它在二分法案例中的可行性已得到证实。两项模拟研究和一项实证研究评估了各种因素对 SVM 分类性能的影响，包括训练样本大小、属性结构、猜测/滑动水平、属性数量、属性水平数量和项目数量。结果表明，在依赖属性结构和样本量较小的情况下，SVM 的分类准确性优于 pG-DINA 模型。SVM 的性能随着项目数量的增加而提高，但随着猜测/滑动水平的提高、属性数量的增加和属性等级的增加而下降。经验数据进一步验证了 SVM 的应用和优势。

引用次数: 0

Average treatment effects on binary outcomes with stochastic covariates. 随机协变量对二元结果的平均治疗效果。

IF 1.5 3区心理学 Q3 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS

British Journal of Mathematical & Statistical Psychology

Pub Date : 2024-07-24 DOI: 10.1111/bmsp.12355

Christoph Kiefer, Marcella L Woud, Simon E Blackwell, Axel Mayer

When evaluating the effect of psychological treatments on a dichotomous outcome variable in a randomized controlled trial (RCT), covariate adjustment using logistic regression models is often applied. In the presence of covariates, average marginal effects (AMEs) are often preferred over odds ratios, as AMEs yield a clearer substantive and causal interpretation. However, standard error computation of AMEs neglects sampling-based uncertainty (i.e., covariate values are assumed to be fixed over repeated sampling), which leads to underestimation of AME standard errors in other generalized linear models (e.g., Poisson regression). In this paper, we present and compare approaches allowing for stochastic (i.e., randomly sampled) covariates in models for binary outcomes. In a simulation study, we investigated the quality of the AME and stochastic-covariate approaches focusing on statistical inference in finite samples. Our results indicate that the fixed-covariate approach provides reliable results only if there is no heterogeneity in interindividual treatment effects (i.e., presence of treatment-covariate interactions), while the stochastic-covariate approaches are preferable in all other simulated conditions. We provide an illustrative example from clinical psychology investigating the effect of a cognitive bias modification training on post-traumatic stress disorder while accounting for patients' anxiety using an RCT.

在随机对照试验（RCT）中评估心理治疗对二分法结果变量的影响时，通常会使用逻辑回归模型进行协变量调整。在存在协变量的情况下，平均边际效应（AMEs）往往比几率比较大，因为平均边际效应能产生更清晰的实质和因果解释。然而，平均边际效应的标准误差计算忽略了基于抽样的不确定性（即假设协变量值在重复抽样中是固定的），这导致在其他广义线性模型（如泊松回归）中平均边际效应标准误差被低估。在本文中，我们介绍并比较了二元结果模型中允许随机（即随机抽样）协变量的方法。在一项模拟研究中，我们以有限样本的统计推断为重点，调查了 AME 和随机协变量方法的质量。我们的结果表明，只有在个体间治疗效果不存在异质性（即存在治疗-变量交互作用）的情况下，固定-变量方法才能提供可靠的结果，而在所有其他模拟条件下，随机-变量方法更为可取。我们提供了一个临床心理学的示例，研究认知偏差修正训练对创伤后应激障碍的影响，同时使用 RCT 考虑患者的焦虑。

{"title":"Average treatment effects on binary outcomes with stochastic covariates.","authors":"Christoph Kiefer, Marcella L Woud, Simon E Blackwell, Axel Mayer","doi":"10.1111/bmsp.12355","DOIUrl":"https://doi.org/10.1111/bmsp.12355","url":null,"abstract":"When evaluating the effect of psychological treatments on a dichotomous outcome variable in a randomized controlled trial (RCT), covariate adjustment using logistic regression models is often applied. In the presence of covariates, average marginal effects (AMEs) are often preferred over odds ratios, as AMEs yield a clearer substantive and causal interpretation. However, standard error computation of AMEs neglects sampling-based uncertainty (i.e., covariate values are assumed to be fixed over repeated sampling), which leads to underestimation of AME standard errors in other generalized linear models (e.g., Poisson regression). In this paper, we present and compare approaches allowing for stochastic (i.e., randomly sampled) covariates in models for binary outcomes. In a simulation study, we investigated the quality of the AME and stochastic-covariate approaches focusing on statistical inference in finite samples. Our results indicate that the fixed-covariate approach provides reliable results only if there is no heterogeneity in interindividual treatment effects (i.e., presence of treatment-covariate interactions), while the stochastic-covariate approaches are preferable in all other simulated conditions. We provide an illustrative example from clinical psychology investigating the effect of a cognitive bias modification training on post-traumatic stress disorder while accounting for patients' anxiety using an RCT.","PeriodicalId":55322,"journal":{"name":"British Journal of Mathematical & Statistical Psychology","volume":" ","pages":""},"PeriodicalIF":1.5,"publicationDate":"2024-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141753440","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Are alternative variables in a set differently associated with a target variable? Statistical tests and practical advice for dealing with dependent correlations. 一组变量中的其他变量与目标变量的相关性是否不同？处理从属相关性的统计检验和实用建议。

IF 1.5 3区心理学 Q3 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS

British Journal of Mathematical & Statistical Psychology

Pub Date : 2024-06-24 DOI: 10.1111/bmsp.12354

Miguel A García-Pérez

The analysis of multiple bivariate correlations is often carried out by conducting simple tests to check whether each of them is significantly different from zero. In addition, pairwise differences are often judged by eye or by comparing the p-values of the individual tests of significance despite the existence of statistical tests for differences between correlations. This paper uses simulation methods to assess the accuracy (empirical Type I error rate), power, and robustness of 10 tests designed to check the significance of the difference between two dependent correlations with overlapping variables (i.e., the correlation between X₁ and Y and the correlation between X₂ and Y). Five of the tests turned out to be inadvisable because their empirical Type I error rates under normality differ greatly from the nominal alpha level of .05 either across the board or within certain sub-ranges of the parameter space. The remaining five tests were acceptable and their merits were similar in terms of all comparison criteria, although none of them was robust across all forms of non-normality explored in the study. Practical recommendations are given for the choice of a statistical test to compare dependent correlations with overlapping variables.

在分析多个二元相关性时，通常会进行简单的检验，检查每个相关性是否与零有显著差异。此外，尽管存在相关性之间差异的统计检验，但通常通过眼睛或比较单个显著性检验的 p 值来判断成对差异。本文使用模拟方法评估了 10 个检验的准确性（经验 I 类错误率）、有效性和稳健性，这些检验旨在检查两个变量重叠的因变量相关性（即 X1 和 Y 之间的相关性以及 X2 和 Y 之间的相关性）之间差异的显著性。其中五个测试结果是不可取的，因为它们在正态性下的经验 I 类误差率与 0.05 的名义α水平相差很大，要么是全面相差，要么是在参数空间的某些子范围内相差很大。其余五种检验是可以接受的，它们在所有比较标准方面的优点相似，但没有一种检验在本研究探讨的所有非正态性形式中都是稳健的。本文就如何选择统计检验来比较具有重叠变量的因果相关性提出了实用建议。

{"title":"Are alternative variables in a set differently associated with a target variable? Statistical tests and practical advice for dealing with dependent correlations.","authors":"Miguel A García-Pérez","doi":"10.1111/bmsp.12354","DOIUrl":"https://doi.org/10.1111/bmsp.12354","url":null,"abstract":"The analysis of multiple bivariate correlations is often carried out by conducting simple tests to check whether each of them is significantly different from zero. In addition, pairwise differences are often judged by eye or by comparing the p-values of the individual tests of significance despite the existence of statistical tests for differences between correlations. This paper uses simulation methods to assess the accuracy (empirical Type I error rate), power, and robustness of 10 tests designed to check the significance of the difference between two dependent correlations with overlapping variables (i.e., the correlation between X1 and Y and the correlation between X2 and Y). Five of the tests turned out to be inadvisable because their empirical Type I error rates under normality differ greatly from the nominal alpha level of .05 either across the board or within certain sub-ranges of the parameter space. The remaining five tests were acceptable and their merits were similar in terms of all comparison criteria, although none of them was robust across all forms of non-normality explored in the study. Practical recommendations are given for the choice of a statistical test to compare dependent correlations with overlapping variables.","PeriodicalId":55322,"journal":{"name":"British Journal of Mathematical & Statistical Psychology","volume":" ","pages":""},"PeriodicalIF":1.5,"publicationDate":"2024-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141460882","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Determining the number of attributes in the GDINA model. 确定 GDINA 模型的属性数量。

IF 2.6 3区心理学 Q3 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS

British Journal of Mathematical & Statistical Psychology

Pub Date : 2024-06-18 DOI: 10.1111/bmsp.12349

Juntao Wang, Jiangtao Duan

Exploratory cognitive diagnosis models have been widely used in psychology, education and other fields. This paper focuses on determining the number of attributes in a widely used cognitive diagnosis model, the GDINA model. Under some conditions of cognitive diagnosis models, we prove that there exists a special structure for the covariance matrix of observed data. Due to the special structure of the covariance matrix, an estimator based on eigen-decomposition is proposed for the number of attributes for the GDINA model. The performance of the proposed estimator is verified by simulation studies. Finally, the proposed estimator is applied to two real data sets Examination for the Certificate of Proficiency in English (ECPE) and Big Five Personality (BFP).

探索性认知诊断模型已广泛应用于心理学、教育学和其他领域。本文主要研究如何确定一种广泛使用的认知诊断模型--GDINA 模型--中的属性数量。在认知诊断模型的某些条件下，我们证明了观察数据的协方差矩阵存在一种特殊结构。基于协方差矩阵的特殊结构，我们提出了一种基于特征分解的 GDINA 模型属性数估计器。模拟研究验证了所提估计器的性能。最后，将所提出的估计器应用于两个真实数据集：英语水平证书考试（ECPE）和大五人格（BFP）。

引用次数: 0

Nonparametric CD-CAT for multiple-choice items: Item selection method and Q-optimality. 多选题的非参数 CD-CAT：项目选择方法和 Q-最优性。

IF 2.6 3区心理学 Q3 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS

British Journal of Mathematical & Statistical Psychology

Pub Date : 2024-05-25 DOI: 10.1111/bmsp.12350

Yu Wang, Chia-Yi Chiu, Hans Friedrich Köhn

Computerized adaptive testing for cognitive diagnosis (CD-CAT) achieves remarkable estimation efficiency and accuracy by adaptively selecting and then administering items tailored to each examinee. The process of item selection stands as a pivotal component of a CD-CAT algorithm, with various methods having been developed for binary responses. However, multiple-choice (MC) items, an important item type that allows for the extraction of richer diagnostic information from incorrect answers, have been underemphasized. Currently, the Jensen-Shannon divergence (JSD) index introduced by Yigit et al. (Applied Psychological Measurement, 2019, 43, 388) is the only item selection method exclusively designed for MC items. However, the JSD index requires a large sample to calibrate item parameters, which may be infeasible when there is only a small or no calibration sample. To bridge this gap, the study first proposes a nonparametric item selection method for MC items (MC-NPS) by implementing novel discrimination power that measures an item's ability to effectively distinguish among different attribute profiles. A Q-optimal procedure for MC items is also developed to improve the classification during the initial phase of a CD-CAT algorithm. The effectiveness and efficiency of the two proposed algorithms were confirmed by simulation studies.

用于认知诊断的计算机化自适应测试（CD-CAT）通过自适应地选择和实施适合每个受试者的项目，实现了显著的估计效率和准确性。项目选择过程是 CD-CAT 算法的关键组成部分，针对二元应答开发了各种方法。然而，多选题（MC）作为一种重要的题目类型，可以从错误答案中提取更丰富的诊断信息，却一直未得到足够重视。目前，Yigit 等人提出的詹森-香农分歧（JSD）指数（《应用心理测量》，2019 年，43 期，388）是唯一一种专为 MC 题项设计的题项选择方法。然而，JSD 指数需要大量样本来校准项目参数，这在只有少量校准样本或没有校准样本的情况下可能是不可行的。为了弥补这一差距，本研究首先提出了一种适用于 MC 项目的非参数项目选择方法（MC-NPS），它采用了新颖的区分度来衡量项目有效区分不同属性特征的能力。此外，还为 MC 项目开发了 Q 最佳程序，以改进 CD-CAT 算法初始阶段的分类。模拟研究证实了这两种拟议算法的有效性和效率。

{"title":"Nonparametric CD-CAT for multiple-choice items: Item selection method and Q-optimality.","authors":"Yu Wang, Chia-Yi Chiu, Hans Friedrich Köhn","doi":"10.1111/bmsp.12350","DOIUrl":"https://doi.org/10.1111/bmsp.12350","url":null,"abstract":"Computerized adaptive testing for cognitive diagnosis (CD-CAT) achieves remarkable estimation efficiency and accuracy by adaptively selecting and then administering items tailored to each examinee. The process of item selection stands as a pivotal component of a CD-CAT algorithm, with various methods having been developed for binary responses. However, multiple-choice (MC) items, an important item type that allows for the extraction of richer diagnostic information from incorrect answers, have been underemphasized. Currently, the Jensen-Shannon divergence (JSD) index introduced by Yigit et al. (Applied Psychological Measurement, 2019, 43, 388) is the only item selection method exclusively designed for MC items. However, the JSD index requires a large sample to calibrate item parameters, which may be infeasible when there is only a small or no calibration sample. To bridge this gap, the study first proposes a nonparametric item selection method for MC items (MC-NPS) by implementing novel discrimination power that measures an item's ability to effectively distinguish among different attribute profiles. A Q-optimal procedure for MC items is also developed to improve the classification during the initial phase of a CD-CAT algorithm. The effectiveness and efficiency of the two proposed algorithms were confirmed by simulation studies.","PeriodicalId":55322,"journal":{"name":"British Journal of Mathematical & Statistical Psychology","volume":" ","pages":""},"PeriodicalIF":2.6,"publicationDate":"2024-05-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141096972","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Incorporating calibration errors in oral reading fluency scoring. 将校准误差纳入口语阅读流利度评分。

IF 2.6 3区心理学 Q3 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS

British Journal of Mathematical & Statistical Psychology

Pub Date : 2024-05-10 DOI: 10.1111/bmsp.12348

Xin Qiao, Akihito Kamata, Cornelis Potgieter

Oral reading fluency (ORF) assessments are commonly used to screen at-risk readers and evaluate interventions' effectiveness as curriculum-based measurements. Similar to the standard practice in item response theory (IRT), calibrated passage parameter estimates are currently used as if they were population values in model-based ORF scoring. However, calibration errors that are unaccounted for may bias ORF score estimates and, in particular, lead to underestimated standard errors (SEs) of ORF scores. Therefore, we consider an approach that incorporates the calibration errors in latent variable scores. We further derive the SEs of ORF scores based on the delta method to incorporate the calibration uncertainty. We conduct a simulation study to evaluate the recovery of point estimates and SEs of latent variable scores and ORF scores in various simulated conditions. Results suggest that ignoring calibration errors leads to underestimated latent variable score SEs and ORF score SEs, especially when the calibration sample is small.

口语阅读流利度（ORF）评估通常用于筛选高危读者和评估干预措施的有效性，是以课程为基础的测量方法。与项目反应理论（IRT）中的标准做法类似，目前在基于模型的口语阅读流利度评分中，校准过的段落参数估计值被当作人口值使用。然而，未考虑的校准误差可能会使 ORF 分数估计值出现偏差，特别是会导致 ORF 分数的标准误差（SE）被低估。因此，我们考虑了一种将校准误差纳入潜在变量得分的方法。我们根据德尔塔法进一步推导 ORF 分数的 SE，以纳入校准的不确定性。我们进行了一项模拟研究，以评估在各种模拟条件下潜在变量得分和 ORF 分数的点估计值和 SE 的恢复情况。结果表明，忽略校准误差会导致低估潜变量得分 SE 和 ORF 分数 SE，尤其是当校准样本较小时。

引用次数: 0

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

British Journal of Mathematical & Statistical Psychology

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀