Computerized adaptive testing for cognitive diagnosis (CD-CAT) achieves remarkable estimation efficiency and accuracy by adaptively selecting and then administering items tailored to each examinee. The process of item selection stands as a pivotal component of a CD-CAT algorithm, with various methods having been developed for binary responses. However, multiple-choice (MC) items, an important item type that allows for the extraction of richer diagnostic information from incorrect answers, have been underemphasized. Currently, the Jensen–Shannon divergence (JSD) index introduced by Yigit et al. (Applied Psychological Measurement, 2019, 43, 388) is the only item selection method exclusively designed for MC items. However, the JSD index requires a large sample to calibrate item parameters, which may be infeasible when there is only a small or no calibration sample. To bridge this gap, the study first proposes a nonparametric item selection method for MC items (MC-NPS) by implementing novel discrimination power that measures an item's ability to effectively distinguish among different attribute profiles. A Q-optimal procedure for MC items is also developed to improve the classification during the initial phase of a CD-CAT algorithm. The effectiveness and efficiency of the two proposed algorithms were confirmed by simulation studies.
用于认知诊断的计算机化自适应测试(CD-CAT)通过自适应地选择和实施适合每个受试者的项目,实现了显著的估计效率和准确性。项目选择过程是 CD-CAT 算法的关键组成部分,针对二元应答开发了各种方法。然而,多选题(MC)作为一种重要的题目类型,可以从错误答案中提取更丰富的诊断信息,却一直未得到足够重视。目前,Yigit 等人提出的詹森-香农分歧(JSD)指数(《应用心理测量》,2019 年,43 期,388)是唯一一种专为 MC 题项设计的题项选择方法。然而,JSD 指数需要大量样本来校准项目参数,这在只有少量校准样本或没有校准样本的情况下可能是不可行的。为了弥补这一差距,本研究首先提出了一种适用于 MC 项目的非参数项目选择方法(MC-NPS),它采用了新颖的区分度来衡量项目有效区分不同属性特征的能力。此外,还为 MC 项目开发了 Q 最佳程序,以改进 CD-CAT 算法初始阶段的分类。模拟研究证实了这两种拟议算法的有效性和效率。
{"title":"Nonparametric CD-CAT for multiple-choice items: Item selection method and Q-optimality","authors":"Yu Wang, Chia-Yi Chiu, Hans Friedrich Köhn","doi":"10.1111/bmsp.12350","DOIUrl":"10.1111/bmsp.12350","url":null,"abstract":"<p>Computerized adaptive testing for cognitive diagnosis (CD-CAT) achieves remarkable estimation efficiency and accuracy by adaptively selecting and then administering items tailored to each examinee. The process of item selection stands as a pivotal component of a CD-CAT algorithm, with various methods having been developed for binary responses. However, multiple-choice (MC) items, an important item type that allows for the extraction of richer diagnostic information from incorrect answers, have been underemphasized. Currently, the Jensen–Shannon divergence (JSD) index introduced by Yigit et al. (<i>Applied Psychological Measurement</i>, 2019, 43, 388) is the only item selection method exclusively designed for MC items. However, the JSD index requires a large sample to calibrate item parameters, which may be infeasible when there is only a small or no calibration sample. To bridge this gap, the study first proposes a nonparametric item selection method for MC items (MC-NPS) by implementing novel discrimination power that measures an item's ability to effectively distinguish among different attribute profiles. A Q-optimal procedure for MC items is also developed to improve the classification during the initial phase of a CD-CAT algorithm. The effectiveness and efficiency of the two proposed algorithms were confirmed by simulation studies.</p>","PeriodicalId":55322,"journal":{"name":"British Journal of Mathematical & Statistical Psychology","volume":"78 1","pages":"61-83"},"PeriodicalIF":1.8,"publicationDate":"2024-05-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/bmsp.12350","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141096972","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Oral reading fluency (ORF) assessments are commonly used to screen at-risk readers and evaluate interventions' effectiveness as curriculum-based measurements. Similar to the standard practice in item response theory (IRT), calibrated passage parameter estimates are currently used as if they were population values in model-based ORF scoring. However, calibration errors that are unaccounted for may bias ORF score estimates and, in particular, lead to underestimated standard errors (SEs) of ORF scores. Therefore, we consider an approach that incorporates the calibration errors in latent variable scores. We further derive the SEs of ORF scores based on the delta method to incorporate the calibration uncertainty. We conduct a simulation study to evaluate the recovery of point estimates and SEs of latent variable scores and ORF scores in various simulated conditions. Results suggest that ignoring calibration errors leads to underestimated latent variable score SEs and ORF score SEs, especially when the calibration sample is small.
口语阅读流利度(ORF)评估通常用于筛选高危读者和评估干预措施的有效性,是以课程为基础的测量方法。与项目反应理论(IRT)中的标准做法类似,目前在基于模型的口语阅读流利度评分中,校准过的段落参数估计值被当作人口值使用。然而,未考虑的校准误差可能会使 ORF 分数估计值出现偏差,特别是会导致 ORF 分数的标准误差(SE)被低估。因此,我们考虑了一种将校准误差纳入潜在变量得分的方法。我们根据德尔塔法进一步推导 ORF 分数的 SE,以纳入校准的不确定性。我们进行了一项模拟研究,以评估在各种模拟条件下潜在变量得分和 ORF 分数的点估计值和 SE 的恢复情况。结果表明,忽略校准误差会导致低估潜变量得分 SE 和 ORF 分数 SE,尤其是当校准样本较小时。
{"title":"Incorporating calibration errors in oral reading fluency scoring","authors":"Xin Qiao, Akihito Kamata, Cornelis Potgieter","doi":"10.1111/bmsp.12348","DOIUrl":"10.1111/bmsp.12348","url":null,"abstract":"<p>Oral reading fluency (ORF) assessments are commonly used to screen at-risk readers and evaluate interventions' effectiveness as curriculum-based measurements. Similar to the standard practice in item response theory (IRT), calibrated passage parameter estimates are currently used as if they were population values in model-based ORF scoring. However, calibration errors that are unaccounted for may bias ORF score estimates and, in particular, lead to underestimated standard errors (SEs) of ORF scores. Therefore, we consider an approach that incorporates the calibration errors in latent variable scores. We further derive the SEs of ORF scores based on the delta method to incorporate the calibration uncertainty. We conduct a simulation study to evaluate the recovery of point estimates and SEs of latent variable scores and ORF scores in various simulated conditions. Results suggest that ignoring calibration errors leads to underestimated latent variable score SEs and ORF score SEs, especially when the calibration sample is small.</p>","PeriodicalId":55322,"journal":{"name":"British Journal of Mathematical & Statistical Psychology","volume":"78 1","pages":"44-60"},"PeriodicalIF":1.8,"publicationDate":"2024-05-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140900014","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Giuseppe Alfonzetti, Ruggero Bellio, Yunxiao Chen, Irini Moustaki
Pairwise likelihood is a limited-information method widely used to estimate latent variable models, including factor analysis of categorical data. It can often avoid evaluating high-dimensional integrals and, thus, is computationally more efficient than relying on the full likelihood. Despite its computational advantage, the pairwise likelihood approach can still be demanding for large-scale problems that involve many observed variables. We tackle this challenge by employing an approximation of the pairwise likelihood estimator, which is derived from an optimization procedure relying on stochastic gradients. The stochastic gradients are constructed by subsampling the pairwise log-likelihood contributions, for which the subsampling scheme controls the per-iteration computational complexity. The stochastic estimator is shown to be asymptotically equivalent to the pairwise likelihood one. However, finite-sample performance can be improved by compounding the sampling variability of the data with the uncertainty introduced by the subsampling scheme. We demonstrate the performance of the proposed method using simulation studies and two real data applications.
{"title":"Pairwise stochastic approximation for confirmatory factor analysis of categorical data","authors":"Giuseppe Alfonzetti, Ruggero Bellio, Yunxiao Chen, Irini Moustaki","doi":"10.1111/bmsp.12347","DOIUrl":"10.1111/bmsp.12347","url":null,"abstract":"<p>Pairwise likelihood is a limited-information method widely used to estimate latent variable models, including factor analysis of categorical data. It can often avoid evaluating high-dimensional integrals and, thus, is computationally more efficient than relying on the full likelihood. Despite its computational advantage, the pairwise likelihood approach can still be demanding for large-scale problems that involve many observed variables. We tackle this challenge by employing an approximation of the pairwise likelihood estimator, which is derived from an optimization procedure relying on stochastic gradients. The stochastic gradients are constructed by subsampling the pairwise log-likelihood contributions, for which the subsampling scheme controls the per-iteration computational complexity. The stochastic estimator is shown to be asymptotically equivalent to the pairwise likelihood one. However, finite-sample performance can be improved by compounding the sampling variability of the data with the uncertainty introduced by the subsampling scheme. We demonstrate the performance of the proposed method using simulation studies and two real data applications.</p>","PeriodicalId":55322,"journal":{"name":"British Journal of Mathematical & Statistical Psychology","volume":"78 1","pages":"22-43"},"PeriodicalIF":1.8,"publicationDate":"2024-04-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140811401","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In psychological studies, multivariate outcomes measured on the same individuals are often encountered. Effects originating from these outcomes are consequently dependent. Multivariate meta-analysis examines the relationships of multivariate outcomes by estimating the mean effects and their variance–covariance matrices from series of primary studies. In this paper we discuss a unified modelling framework for multivariate meta-analysis that also incorporates measurement error corrections. We focus on two types of effect sizes, standardized mean differences (d) and correlations (r), that are common in psychological studies. Using generalized least squares estimation, we outline estimated mean vectors and variance–covariance matrices for d and r that are corrected for measurement error. Given the burgeoning research involving multivariate outcomes, and the largely overlooked ramifications of measurement error, we advocate addressing measurement error while conducting multivariate meta-analysis to enhance the replicability of psychological research.
在心理学研究中,经常会遇到对同一个人进行多变量测量的结果。因此,这些结果所产生的效应具有依赖性。多元荟萃分析通过估算一系列主要研究的平均效应及其方差-协方差矩阵来研究多元结果之间的关系。本文讨论了多元荟萃分析的统一建模框架,该框架还包含测量误差校正。我们将重点放在心理学研究中常见的两种效应大小--标准化平均差(d)和相关性(r)。利用广义最小二乘法估计,我们概述了经测量误差校正的 d 和 r 的估计均值向量和方差-协方差矩阵。鉴于涉及多元结果的研究方兴未艾,而测量误差的影响在很大程度上被忽视,我们主张在进行多元荟萃分析时解决测量误差问题,以提高心理学研究的可复制性。
{"title":"Advances in meta-analysis: A unifying modelling framework with measurement error correction","authors":"Betsy Jane Becker, Qian Zhang","doi":"10.1111/bmsp.12345","DOIUrl":"10.1111/bmsp.12345","url":null,"abstract":"<p>In psychological studies, multivariate outcomes measured on the same individuals are often encountered. Effects originating from these outcomes are consequently dependent. Multivariate meta-analysis examines the relationships of multivariate outcomes by estimating the mean effects and their variance–covariance matrices from series of primary studies. In this paper we discuss a unified modelling framework for multivariate meta-analysis that also incorporates measurement error corrections. We focus on two types of effect sizes, standardized mean differences (<i>d</i>) and correlations (<i>r</i>), that are common in psychological studies. Using generalized least squares estimation, we outline estimated mean vectors and variance–covariance matrices for <i>d</i> and <i>r</i> that are corrected for measurement error. Given the burgeoning research involving multivariate outcomes, and the largely overlooked ramifications of measurement error, we advocate addressing measurement error while conducting multivariate meta-analysis to enhance the replicability of psychological research.</p>","PeriodicalId":55322,"journal":{"name":"British Journal of Mathematical & Statistical Psychology","volume":"77 3","pages":"395-428"},"PeriodicalIF":1.8,"publicationDate":"2024-04-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/bmsp.12345","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140798185","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xiaojian Sun, Tongxin Zhang, Chang Nie, Naiqing Song, Tao Xin
Q-matrix is an important component of most cognitive diagnosis models (CDMs); however, it mainly relies on subject matter experts' judgements in empirical studies, which introduces the possibility of misspecified q-entries. To address this, statistical Q-matrix validation methods have been proposed to aid experts' judgement. A few of these methods, including the multiple logistic regression-based (MLR-B) method and the Hull method, can be applied to general CDMs, but they are either time-consuming or lack accuracy under certain conditions. In this study, we combine the L1 regularization and MLR model to validate the Q-matrix. Specifically, an L1 penalty term is imposed on the log-likelihood of the MLR model to select the necessary attributes for each item. A simulation study with various factors was conducted to examine the performance of the new method against the two existing methods. The results show that the regularized MLR-B method (a) produces the highest Q-matrix recovery rate (QRR) and true positive rate (TPR) for most conditions, especially with a small sample size; (b) yields a slightly higher true negative rate (TNR) than either the MLR-B or the Hull method for most conditions; and (c) requires less computation time than the MLR-B method and similar computation time as the Hull method. A real data set is analysed for illustration purposes.
{"title":"Combining regularization and logistic regression model to validate the Q-matrix for cognitive diagnosis model","authors":"Xiaojian Sun, Tongxin Zhang, Chang Nie, Naiqing Song, Tao Xin","doi":"10.1111/bmsp.12346","DOIUrl":"10.1111/bmsp.12346","url":null,"abstract":"<p>Q-matrix is an important component of most cognitive diagnosis models (CDMs); however, it mainly relies on subject matter experts' judgements in empirical studies, which introduces the possibility of misspecified q-entries. To address this, statistical Q-matrix validation methods have been proposed to aid experts' judgement. A few of these methods, including the multiple logistic regression-based (MLR-B) method and the Hull method, can be applied to general CDMs, but they are either time-consuming or lack accuracy under certain conditions. In this study, we combine the <i>L</i><sub>1</sub> regularization and MLR model to validate the Q-matrix. Specifically, an <i>L</i><sub>1</sub> penalty term is imposed on the log-likelihood of the MLR model to select the necessary attributes for each item. A simulation study with various factors was conducted to examine the performance of the new method against the two existing methods. The results show that the regularized MLR-B method (a) produces the highest Q-matrix recovery rate (QRR) and true positive rate (TPR) for most conditions, especially with a small sample size; (b) yields a slightly higher true negative rate (TNR) than either the MLR-B or the Hull method for most conditions; and (c) requires less computation time than the MLR-B method and similar computation time as the Hull method. A real data set is analysed for illustration purposes.</p>","PeriodicalId":55322,"journal":{"name":"British Journal of Mathematical & Statistical Psychology","volume":"78 1","pages":"1-21"},"PeriodicalIF":1.8,"publicationDate":"2024-04-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140677462","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Recent years have seen a growing interest in the development of person-fit statistics for tests with polytomous items. Some of the most popular person-fit statistics for such tests belong to the class of standardized person-fit statistics,