首页 > 最新文献

Applied Psychological Measurement最新文献

英文 中文
Accuracy in Invariance Detection With Multilevel Models With Three Estimators. 具有三个估计量的多水平模型的不变性检测精度。
IF 1 4区 心理学 Q4 PSYCHOLOGY, MATHEMATICAL Pub Date : 2025-03-24 DOI: 10.1177/01466216251325644
W Holmes Finch, Cihan Demir, Brian F French, Thao Vo

Applied and simulation studies document model convergence and accuracy issues in differential item functioning detection with multilevel models, hindering detection. This study aimed to evaluate the effectiveness of various estimation techniques in addressing these issues and ensure robust DIF detection. We conducted a simulation study to investigate the performance of multilevel logistic regression models with predictors at level 2 across different estimation procedures, including maximum likelihood estimation (MLE), Bayesian estimation, and generalized estimating equations (GEE). The simulation results demonstrated that all maintained control over the Type I error rate across conditions. In most cases, GEE had comparable or higher power compared to MLE for identifying DIF, with Bayes having the lowest power. When potentially important covariates at levels-1 and 2 were included in the model, power for all methods was higher. These results suggest that in many cases where multilevel logistic regression is used for DIF detection, GEE offers a viable option for researchers and that including important contextual variables at all levels of the data is desirable. Implications for practice are discussed.

应用与仿真研究了多层次模型在差异项目功能检测中的收敛性和准确性问题。本研究旨在评估各种估计技术在解决这些问题和确保鲁棒DIF检测方面的有效性。我们进行了一项模拟研究,以研究具有2级预测因子的多水平逻辑回归模型在不同估计过程中的性能,包括最大似然估计(MLE)、贝叶斯估计和广义估计方程(GEE)。仿真结果表明,在各种条件下都能保持对I型错误率的控制。在大多数情况下,与MLE相比,GEE识别DIF的能力相当或更高,而贝叶斯的能力最低。当模型中包含level -1和level - 2的潜在重要协变量时,所有方法的有效性都更高。这些结果表明,在许多使用多水平逻辑回归进行DIF检测的情况下,GEE为研究人员提供了一个可行的选择,并且在所有水平的数据中包括重要的上下文变量是可取的。讨论了对实践的启示。
{"title":"Accuracy in Invariance Detection With Multilevel Models With Three Estimators.","authors":"W Holmes Finch, Cihan Demir, Brian F French, Thao Vo","doi":"10.1177/01466216251325644","DOIUrl":"10.1177/01466216251325644","url":null,"abstract":"<p><p>Applied and simulation studies document model convergence and accuracy issues in differential item functioning detection with multilevel models, hindering detection. This study aimed to evaluate the effectiveness of various estimation techniques in addressing these issues and ensure robust DIF detection. We conducted a simulation study to investigate the performance of multilevel logistic regression models with predictors at level 2 across different estimation procedures, including maximum likelihood estimation (MLE), Bayesian estimation, and generalized estimating equations (GEE). The simulation results demonstrated that all maintained control over the Type I error rate across conditions. In most cases, GEE had comparable or higher power compared to MLE for identifying DIF, with Bayes having the lowest power. When potentially important covariates at levels-1 and 2 were included in the model, power for all methods was higher. These results suggest that in many cases where multilevel logistic regression is used for DIF detection, GEE offers a viable option for researchers and that including important contextual variables at all levels of the data is desirable. Implications for practice are discussed.</p>","PeriodicalId":48300,"journal":{"name":"Applied Psychological Measurement","volume":" ","pages":"01466216251325644"},"PeriodicalIF":1.0,"publicationDate":"2025-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11948245/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143755115","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Calculating Bias in Test Score Equating in a NEAT Design. 在NEAT设计中计算考试分数等同中的偏差。
IF 1 4区 心理学 Q4 PSYCHOLOGY, MATHEMATICAL Pub Date : 2025-03-24 DOI: 10.1177/01466216251330305
Marie Wiberg, Inga Laukaityte

Test score equating is used to make scores from different test forms comparable, even when groups differ in ability. In practice, the non-equivalent group with anchor test (NEAT) design is commonly used. The overall aim was to compare the amount of bias under different conditions when using either chained equating or frequency estimation with five different criterion functions: the identity function, linear equating, equipercentile, chained equating and frequency estimation. We used real test data from a multiple-choice binary scored college admissions test to illustrate that the choice of criterion function matter. Further, we simulated data in line with the empirical data to examine difference in ability between groups, difference in item difficulty, difference in anchor test form and regular test form length, difference in correlations between anchor test form and regular test forms, and different sample size. The results indicate that how bias is defined heavily affects the conclusions we draw about which equating method is to be preferred in different scenarios. Practical implications of this in standardized tests are given together with recommendations on how to calculate bias when evaluating equating transformations.

考试成绩相等是用来使不同考试形式的分数具有可比性,即使小组的能力不同。在实践中,通常采用非等效群锚试验(NEAT)设计。总体目标是比较不同条件下的偏差量,当使用链式方程或频率估计与五个不同的标准函数:恒等函数,线性方程,等百分位,链式方程和频率估计。我们使用了大学录取考试中多项选择题的真实测试数据来说明标准函数的选择很重要。进一步,我们根据实证数据模拟数据,检验各组能力差异、项目难度差异、锚试题与常规试题长度差异、锚试题与常规试题相关性差异以及不同样本量。结果表明,如何定义偏差在很大程度上影响了我们得出的结论,即在不同情况下哪种等效方法是首选的。这在标准化测试中的实际影响给出了关于如何计算偏差时,评估等效转换的建议。
{"title":"Calculating Bias in Test Score Equating in a NEAT Design.","authors":"Marie Wiberg, Inga Laukaityte","doi":"10.1177/01466216251330305","DOIUrl":"10.1177/01466216251330305","url":null,"abstract":"<p><p>Test score equating is used to make scores from different test forms comparable, even when groups differ in ability. In practice, the non-equivalent group with anchor test (NEAT) design is commonly used. The overall aim was to compare the amount of bias under different conditions when using either chained equating or frequency estimation with five different criterion functions: the identity function, linear equating, equipercentile, chained equating and frequency estimation. We used real test data from a multiple-choice binary scored college admissions test to illustrate that the choice of criterion function matter. Further, we simulated data in line with the empirical data to examine difference in ability between groups, difference in item difficulty, difference in anchor test form and regular test form length, difference in correlations between anchor test form and regular test forms, and different sample size. The results indicate that how bias is defined heavily affects the conclusions we draw about which equating method is to be preferred in different scenarios. Practical implications of this in standardized tests are given together with recommendations on how to calculate bias when evaluating equating transformations.</p>","PeriodicalId":48300,"journal":{"name":"Applied Psychological Measurement","volume":" ","pages":"01466216251330305"},"PeriodicalIF":1.0,"publicationDate":"2025-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11948241/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143755122","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
On a Reparameterization of the MC-DINA Model. MC-DINA模型的一种再参数化。
IF 1 4区 心理学 Q4 PSYCHOLOGY, MATHEMATICAL Pub Date : 2025-03-11 DOI: 10.1177/01466216251324938
Lawrence T DeCarlo

The MC-DINA model is a cognitive diagnosis model (CDM) for multiple-choice items that was introduced by de la Torre (2009). The model extends the usual CDM in two basic ways: it allows for nominal responses instead of only dichotomous responses, and it allows skills to affect not only the choice of the correct response but also the choice of distractors. Here it is shown that the model can be re-expressed as a multinomial logit model with latent discrete predictors, that is, as a multinomial mixture model; a signal detection-like parameterization is also used. The reparameterization clarifies details about the structure and assumptions of the model, especially with respect to distractors, and helps to reveal parameter restrictions, which in turn have implications for psychological interpretations of the data and for issues with respect to statistical estimation. The approach suggests parsimonious models that are useful for practical applications, particularly for small sample sizes. The restrictions are shown to appear for items from the TIMSS 2007 fourth grade exam.

MC-DINA模型是de la Torre(2009)提出的多选题认知诊断模型(CDM)。该模型以两种基本方式扩展了通常的CDM:它允许名义反应,而不仅仅是二分反应;它允许技能不仅影响正确反应的选择,还影响干扰因素的选择。结果表明,该模型可以重新表示为具有潜在离散预测因子的多项logit模型,即多项混合模型;还使用了类似信号检测的参数化。重新参数化澄清了关于模型结构和假设的细节,特别是关于干扰因素,并有助于揭示参数限制,这反过来又对数据的心理解释和统计估计方面的问题产生影响。这种方法提出了对实际应用有用的精简模型,特别是对小样本量。这些限制出现在TIMSS 2007年四年级考试的项目中。
{"title":"On a Reparameterization of the MC-DINA Model.","authors":"Lawrence T DeCarlo","doi":"10.1177/01466216251324938","DOIUrl":"10.1177/01466216251324938","url":null,"abstract":"<p><p>The MC-DINA model is a cognitive diagnosis model (CDM) for multiple-choice items that was introduced by de la Torre (2009). The model extends the usual CDM in two basic ways: it allows for nominal responses instead of only dichotomous responses, and it allows skills to affect not only the choice of the correct response but also the choice of distractors. Here it is shown that the model can be re-expressed as a multinomial logit model with latent discrete predictors, that is, as a multinomial mixture model; a signal detection-like parameterization is also used. The reparameterization clarifies details about the structure and assumptions of the model, especially with respect to distractors, and helps to reveal parameter restrictions, which in turn have implications for psychological interpretations of the data and for issues with respect to statistical estimation. The approach suggests parsimonious models that are useful for practical applications, particularly for small sample sizes. The restrictions are shown to appear for items from the TIMSS 2007 fourth grade exam.</p>","PeriodicalId":48300,"journal":{"name":"Applied Psychological Measurement","volume":" ","pages":"01466216251324938"},"PeriodicalIF":1.0,"publicationDate":"2025-03-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11897991/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143626591","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Modeling Within- and Between-Person Differences in the Use of the Middle Category in Likert Scales. 李克特量表中中间类别使用的人内与人间差异建模。
IF 1 4区 心理学 Q4 PSYCHOLOGY, MATHEMATICAL Pub Date : 2025-03-02 DOI: 10.1177/01466216251322285
Jesper Tijmstra, Maria Bolsinova

When using Likert scales, the inclusion of a middle-category response option poses a challenge for the valid measurement of the psychological attribute of interest. While this middle category is often included to provide respondents with a neutral response option, respondents may in practice also select this category when they do not want to or cannot give an informative response. If one analyzes the response data without considering these two possible uses of the middle response category, measurement may be confounded. In this paper, we propose a response-mixture IRTree model for the analysis of Likert-scale data. This model acknowledges that the middle response category can either be selected as a non-response option (and hence be uninformative for the attribute of interest) or to communicate a neutral position (and hence be informative), and that this choice depends on both person- and item-characteristics. For each observed middle-category response, the probability that it was intended to be informative is modeled, and both the attribute of substantive interest and a non-response tendency are estimated. The performance of the model is evaluated in a simulation study, and the procedure is applied to empirical data from personality psychology.

当使用李克特量表时,包含一个中间类别的反应选项对感兴趣的心理属性的有效测量提出了挑战。虽然这一中间类别通常是为了给受访者提供一个中立的回答选项,但受访者在实践中也可能在他们不想或不能给出信息性回答时选择这一类别。如果在分析响应数据时不考虑中间响应类别的这两种可能用途,测量可能会混淆。本文提出了一种响应混合IRTree模型,用于李克特尺度数据的分析。该模型承认,中间反应类别既可以被选择为非反应选项(因此对感兴趣的属性没有信息),也可以被选择为传达中立立场(因此有信息),并且这种选择取决于个人和项目的特征。对于每一个观察到的中间类别反应,它被用来提供信息的概率被建模,并估计实质性兴趣属性和非反应倾向。在仿真研究中对模型的性能进行了评估,并将该方法应用于人格心理学的经验数据。
{"title":"Modeling Within- and Between-Person Differences in the Use of the Middle Category in Likert Scales.","authors":"Jesper Tijmstra, Maria Bolsinova","doi":"10.1177/01466216251322285","DOIUrl":"10.1177/01466216251322285","url":null,"abstract":"<p><p>When using Likert scales, the inclusion of a middle-category response option poses a challenge for the valid measurement of the psychological attribute of interest. While this middle category is often included to provide respondents with a neutral response option, respondents may in practice also select this category when they do not want to or cannot give an informative response. If one analyzes the response data without considering these two possible uses of the middle response category, measurement may be confounded. In this paper, we propose a response-mixture IRTree model for the analysis of Likert-scale data. This model acknowledges that the middle response category can either be selected as a non-response option (and hence be uninformative for the attribute of interest) or to communicate a neutral position (and hence be informative), and that this choice depends on both person- and item-characteristics. For each observed middle-category response, the probability that it was intended to be informative is modeled, and both the attribute of substantive interest and a non-response tendency are estimated. The performance of the model is evaluated in a simulation study, and the procedure is applied to empirical data from personality psychology.</p>","PeriodicalId":48300,"journal":{"name":"Applied Psychological Measurement","volume":" ","pages":"01466216251322285"},"PeriodicalIF":1.0,"publicationDate":"2025-03-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11873858/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143558444","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Weighted Answer Similarity Analysis. 加权答案相似度分析。
IF 1 4区 心理学 Q4 PSYCHOLOGY, MATHEMATICAL Pub Date : 2025-03-01 DOI: 10.1177/01466216251322353
Nicholas Trout, Kylie Gorney

Romero et al. (2015; see also Wollack, 1997) developed the ω statistic as a method for detecting unusually similar answers between pairs of examinees. For each pair, the ω statistic considers whether the observed number of similar answers is significantly larger than the expected number of similar answers. However, one limitation of ω is that it does not account for the particular items on which similar answers are observed. Therefore, in this study, we propose a weighted version of the ω statistic that takes this information into account. We compare the performance of the new and existing statistics using detailed simulations in which several factors are manipulated. Results show that while both the new and existing statistics are able to control the Type I error rate, the new statistic is more powerful, on average.

Romero et al. (2015;另见Wollack, 1997)发展了ω统计作为一种方法来检测异常相似的答案对考生之间。对于每一对,ω统计量考虑观察到的相似答案的数量是否显著大于预期的相似答案的数量。然而,ω的一个限制是,它没有考虑到观察到类似答案的特定项目。因此,在本研究中,我们提出了一个考虑到这些信息的ω统计量的加权版本。我们使用几个因素被操纵的详细模拟来比较新的和现有的统计数据的性能。结果表明,虽然新的和现有的统计量都能够控制第一类错误率,但平均而言,新的统计量更强大。
{"title":"Weighted Answer Similarity Analysis.","authors":"Nicholas Trout, Kylie Gorney","doi":"10.1177/01466216251322353","DOIUrl":"10.1177/01466216251322353","url":null,"abstract":"<p><p>Romero et al. (2015; see also Wollack, 1997) developed the <i>ω</i> statistic as a method for detecting unusually similar answers between pairs of examinees. For each pair, the <i>ω</i> statistic considers whether the observed number of similar answers is significantly larger than the expected number of similar answers. However, one limitation of <i>ω</i> is that it does not account for the particular items on which similar answers are observed. Therefore, in this study, we propose a weighted version of the <i>ω</i> statistic that takes this information into account. We compare the performance of the new and existing statistics using detailed simulations in which several factors are manipulated. Results show that while both the new and existing statistics are able to control the Type I error rate, the new statistic is more powerful, on average.</p>","PeriodicalId":48300,"journal":{"name":"Applied Psychological Measurement","volume":" ","pages":"01466216251322353"},"PeriodicalIF":1.0,"publicationDate":"2025-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11873304/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143558445","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Optimal Test Design for Estimation of Mean Ability Growth. 估计平均能力增长的最佳测试设计。
IF 1.2 4区 心理学 Q4 PSYCHOLOGY, MATHEMATICAL Pub Date : 2025-03-01 Epub Date: 2024-10-15 DOI: 10.1177/01466216241291233
Jonas Bjermo

The design of an achievement test is crucial for many reasons. This article focuses on a population's ability growth between school grades. We define design as the allocating of test items concerning the difficulties. The objective is to present an optimal test design method for estimating the mean and percentile ability growth with good precision. We use the asymptotic expression of the variance in terms of the test information. With that criterion for optimization, we propose to use particle swarm optimization to find the optimal design. The results show that the allocation of the item difficulties depends on item discrimination and the magnitude of the ability growth. The optimization function depends on the examinees' abilities, hence, the value of the unknown mean ability growth. Therefore, we will also use an optimum in-average design and conclude that it is robust to uncertainty in the mean ability growth. A test is, in practice, assembled from items stored in an item pool with calibrated item parameters. Hence, we also perform a discrete optimization using simulated annealing and compare the results to the particle swarm optimization.

出于多种原因,成绩测验的设计至关重要。本文的重点是研究学生在不同年级之间的能力增长情况。我们将设计定义为测试项目难度的分配。我们的目标是提出一种最佳的测验设计方法,以精确地估计平均值和百分位数的能力增长。我们使用测试信息方差的渐近表达式。根据这一优化标准,我们建议使用粒子群优化来找到最优设计。结果表明,题目难度的分配取决于题目的区分度和能力增长的幅度。优化函数取决于考生的能力,因此也取决于未知的平均能力增长值。因此,我们也将使用平均值内最优设计,并得出结论:它对平均能力增长的不确定性具有稳健性。在实践中,测试是由存储在项目库中的项目和经过校准的项目参数组合而成的。因此,我们还将使用模拟退火进行离散优化,并将结果与粒子群优化进行比较。
{"title":"Optimal Test Design for Estimation of Mean Ability Growth.","authors":"Jonas Bjermo","doi":"10.1177/01466216241291233","DOIUrl":"10.1177/01466216241291233","url":null,"abstract":"<p><p>The design of an achievement test is crucial for many reasons. This article focuses on a population's ability growth between school grades. We define design as the allocating of test items concerning the difficulties. The objective is to present an optimal test design method for estimating the mean and percentile ability growth with good precision. We use the asymptotic expression of the variance in terms of the test information. With that criterion for optimization, we propose to use particle swarm optimization to find the optimal design. The results show that the allocation of the item difficulties depends on item discrimination and the magnitude of the ability growth. The optimization function depends on the examinees' abilities, hence, the value of the unknown mean ability growth. Therefore, we will also use an optimum in-average design and conclude that it is robust to uncertainty in the mean ability growth. A test is, in practice, assembled from items stored in an item pool with calibrated item parameters. Hence, we also perform a discrete optimization using simulated annealing and compare the results to the particle swarm optimization.</p>","PeriodicalId":48300,"journal":{"name":"Applied Psychological Measurement","volume":" ","pages":"29-49"},"PeriodicalIF":1.2,"publicationDate":"2025-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11560061/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142630381","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Two-Step Q-Matrix Estimation Method. 两步 Q 矩阵估算法
IF 1.2 4区 心理学 Q4 PSYCHOLOGY, MATHEMATICAL Pub Date : 2025-03-01 Epub Date: 2024-10-10 DOI: 10.1177/01466216241284418
Hans-Friedrich Köhn, Chia-Yi Chiu, Olasumbo Oluwalana, Hyunjoo Kim, Jiaxi Wang

Cognitive Diagnosis Models in educational measurement are restricted latent class models that describe ability in a knowledge domain as a composite of latent skills an examinee may have mastered or failed. Different combinations of skills define distinct latent proficiency classes to which examinees are assigned based on test performance. Items of cognitively diagnostic assessments are characterized by skill profiles specifying which skills are required for a correct item response. The item-skill profiles of a test form its Q-matrix. The validity of cognitive diagnosis depends crucially on the correct specification of the Q-matrix. Typically, Q-matrices are determined by curricular experts. However, expert judgment is fallible. Data-driven estimation methods have been developed with the promise of greater accuracy in identifying the Q-matrix of a test. Yet, many of the extant methods encounter computational feasibility issues either in the form of excessive amounts of CPU times or inadmissible estimates. In this article, a two-step algorithm for estimating the Q-matrix is proposed that can be used with any cognitive diagnosis model. Simulations showed that the new method outperformed extant estimation algorithms and was computationally more efficient. It was also applied to Tatsuoka's famous fraction-subtraction data. The paper concludes with a discussion of theoretical and practical implications of the findings.

教育测量中的认知诊断模型是一种有限制的潜类模型,它将某一知识领域的能力描述为受测者可能已掌握或未掌握的潜技能的组合。不同的技能组合定义了不同的潜在能力等级,受测者根据测试成绩被分配到不同的等级。认知诊断式测评的项目是以技能描述为特征的,具体说明正确的项目回答需要哪些技能。测验的项目-技能特征构成了测验的 Q 矩阵。认知诊断的有效性在很大程度上取决于 Q 矩阵的规格是否正确。Q 矩阵通常由课程专家确定。然而,专家的判断是不可靠的。数据驱动的估算方法应运而生,有望更准确地确定测试的 Q 矩阵。然而,许多现存方法都遇到了计算可行性问题,要么需要耗费过多的 CPU 时间,要么估算结果不可接受。本文提出了一种估算 Q 矩阵的两步算法,可用于任何认知诊断模型。模拟结果表明,新方法的性能优于现有的估计算法,而且计算效率更高。该方法还被应用于 Tatsuoka 著名的分数减法数据。论文最后讨论了研究结果的理论和实践意义。
{"title":"A Two-Step Q-Matrix Estimation Method.","authors":"Hans-Friedrich Köhn, Chia-Yi Chiu, Olasumbo Oluwalana, Hyunjoo Kim, Jiaxi Wang","doi":"10.1177/01466216241284418","DOIUrl":"10.1177/01466216241284418","url":null,"abstract":"<p><p>Cognitive Diagnosis Models in educational measurement are restricted latent class models that describe ability in a knowledge domain as a composite of latent skills an examinee may have mastered or failed. Different combinations of skills define distinct latent proficiency classes to which examinees are assigned based on test performance. Items of cognitively diagnostic assessments are characterized by skill profiles specifying which skills are required for a correct item response. The item-skill profiles of a test form its Q-matrix. The validity of cognitive diagnosis depends crucially on the correct specification of the Q-matrix. Typically, Q-matrices are determined by curricular experts. However, expert judgment is fallible. Data-driven estimation methods have been developed with the promise of greater accuracy in identifying the Q-matrix of a test. Yet, many of the extant methods encounter computational feasibility issues either in the form of excessive amounts of CPU times or inadmissible estimates. In this article, a two-step algorithm for estimating the Q-matrix is proposed that can be used with any cognitive diagnosis model. Simulations showed that the new method outperformed extant estimation algorithms and was computationally more efficient. It was also applied to Tatsuoka's famous fraction-subtraction data. The paper concludes with a discussion of theoretical and practical implications of the findings.</p>","PeriodicalId":48300,"journal":{"name":"Applied Psychological Measurement","volume":" ","pages":"3-28"},"PeriodicalIF":1.2,"publicationDate":"2025-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11560062/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142630379","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The Improved EMS Algorithm for Latent Variable Selection in M3PL Model. 用于 M3PL 模型中潜在变量选择的改进 EMS 算法。
IF 1.2 4区 心理学 Q4 PSYCHOLOGY, MATHEMATICAL Pub Date : 2025-03-01 Epub Date: 2024-10-21 DOI: 10.1177/01466216241291237
Laixu Shang, Ping-Feng Xu, Na Shan, Man-Lai Tang, Qian-Zhen Zheng

One of the main concerns in multidimensional item response theory (MIRT) is to detect the relationship between items and latent traits, which can be treated as a latent variable selection problem. An attractive method for latent variable selection in multidimensional 2-parameter logistic (M2PL) model is to minimize the observed Bayesian information criterion (BIC) by the expectation model selection (EMS) algorithm. The EMS algorithm extends the EM algorithm and allows the updates of the model (e.g., the loading structure in MIRT) in the iterations along with the parameters under the model. As an extension of the M2PL model, the multidimensional 3-parameter logistic (M3PL) model introduces an additional guessing parameter which makes the latent variable selection more challenging. In this paper, a well-designed EMS algorithm, named improved EMS (IEMS), is proposed to accurately and efficiently detect the underlying true loading structure in the M3PL model, which also works for the M2PL model. In simulation studies, we compare the IEMS algorithm with several state-of-art methods and the IEMS is of competitiveness in terms of model recovery, estimation precision, and computational efficiency. The IEMS algorithm is illustrated by its application to two real data sets.

多维项目反应理论(MIRT)的主要关注点之一是检测项目与潜在特质之间的关系,这可以看作是一个潜在变量选择问题。在多维双参数逻辑(M2PL)模型中,一种有吸引力的潜变量选择方法是通过期望模型选择(EMS)算法使观察到的贝叶斯信息准则(BIC)最小化。EMS 算法扩展了 EM 算法,允许在迭代中更新模型(如 MIRT 中的负载结构)和模型下的参数。作为 M2PL 模型的扩展,多维三参数逻辑(M3PL)模型引入了一个额外的猜测参数,这使得潜变量选择更具挑战性。本文提出了一种精心设计的 EMS 算法,名为改进 EMS(IEMS),用于准确有效地检测 M3PL 模型中潜在的真实负载结构,该算法同样适用于 M2PL 模型。在模拟研究中,我们将 IEMS 算法与几种最先进的方法进行了比较,IEMS 在模型恢复、估计精度和计算效率方面都具有竞争力。IEMS 算法在两个真实数据集上的应用说明了这一点。
{"title":"The Improved EMS Algorithm for Latent Variable Selection in M3PL Model.","authors":"Laixu Shang, Ping-Feng Xu, Na Shan, Man-Lai Tang, Qian-Zhen Zheng","doi":"10.1177/01466216241291237","DOIUrl":"10.1177/01466216241291237","url":null,"abstract":"<p><p>One of the main concerns in multidimensional item response theory (MIRT) is to detect the relationship between items and latent traits, which can be treated as a latent variable selection problem. An attractive method for latent variable selection in multidimensional 2-parameter logistic (M2PL) model is to minimize the observed Bayesian information criterion (BIC) by the expectation model selection (EMS) algorithm. The EMS algorithm extends the EM algorithm and allows the updates of the model (e.g., the loading structure in MIRT) in the iterations along with the parameters under the model. As an extension of the M2PL model, the multidimensional 3-parameter logistic (M3PL) model introduces an additional guessing parameter which makes the latent variable selection more challenging. In this paper, a well-designed EMS algorithm, named improved EMS (IEMS), is proposed to accurately and efficiently detect the underlying true loading structure in the M3PL model, which also works for the M2PL model. In simulation studies, we compare the IEMS algorithm with several state-of-art methods and the IEMS is of competitiveness in terms of model recovery, estimation precision, and computational efficiency. The IEMS algorithm is illustrated by its application to two real data sets.</p>","PeriodicalId":48300,"journal":{"name":"Applied Psychological Measurement","volume":" ","pages":"50-70"},"PeriodicalIF":1.2,"publicationDate":"2025-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11559968/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142630392","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Impact of Parameter Predictability and Joint Modeling of Response Accuracy and Response Time on Ability Estimates. 参数可预测性及响应精度和响应时间联合建模对能力估计的影响。
IF 1 4区 心理学 Q4 PSYCHOLOGY, MATHEMATICAL Pub Date : 2025-02-26 DOI: 10.1177/01466216251322646
Maryam Pezeshki, Susan Embretson

To maintain test quality, a large supply of items is typically desired. Automatic item generation can result in a reduction in cost and labor, especially if the generated items have predictable item parameters and thus possibly reducing or eliminating the need for empirical tryout. However, the effect of different levels of item parameter predictability on the accuracy of trait estimation using item response theory models is unclear. If predictability is lower, adding response time as a collateral source of information may mitigate the effect on trait estimation accuracy. The present study investigates the impact of varying item parameter predictability on trait estimation accuracy, along with the impact of adding response time as a collateral source of information. Results indicated that trait estimation accuracy using item family model-based item parameters differed only slightly from using known item parameters. Somewhat larger trait estimation errors resulted from using cognitive complexity features to predict item parameters. Further, adding response times to the model resulted in more accurate trait estimation for tests with lower item difficulty levels (e.g., achievement tests). Implications for item generation and response processes aspect of validity are discussed.

为了保持测试质量,通常需要大量的项目供应。自动项目生成可以减少成本和劳动力,特别是如果生成的项目具有可预测的项目参数,从而可能减少或消除经验试验的需要。然而,不同程度的项目参数可预测性对项目反应理论模型特质估计准确性的影响尚不清楚。如果可预测性较低,增加响应时间作为信息的附带来源可能会减轻对特征估计准确性的影响。本研究探讨了不同项目参数可预测性对特质估计准确性的影响,以及增加反应时间作为辅助信息源的影响。结果表明,使用基于项目族模型的项目参数与使用已知项目参数的特质估计准确率差异不大。使用认知复杂性特征预测项目参数会产生较大的特征估计误差。此外,在模型中增加反应时间,可以对项目难度较低的测试(例如成就测试)进行更准确的特征估计。讨论了项目生成和反应过程对效度的影响。
{"title":"Impact of Parameter Predictability and Joint Modeling of Response Accuracy and Response Time on Ability Estimates.","authors":"Maryam Pezeshki, Susan Embretson","doi":"10.1177/01466216251322646","DOIUrl":"https://doi.org/10.1177/01466216251322646","url":null,"abstract":"<p><p>To maintain test quality, a large supply of items is typically desired. Automatic item generation can result in a reduction in cost and labor, especially if the generated items have predictable item parameters and thus possibly reducing or eliminating the need for empirical tryout. However, the effect of different levels of item parameter predictability on the accuracy of trait estimation using item response theory models is unclear. If predictability is lower, adding response time as a collateral source of information may mitigate the effect on trait estimation accuracy. The present study investigates the impact of varying item parameter predictability on trait estimation accuracy, along with the impact of adding response time as a collateral source of information. Results indicated that trait estimation accuracy using item family model-based item parameters differed only slightly from using known item parameters. Somewhat larger trait estimation errors resulted from using cognitive complexity features to predict item parameters. Further, adding response times to the model resulted in more accurate trait estimation for tests with lower item difficulty levels (e.g., achievement tests). Implications for item generation and response processes aspect of validity are discussed.</p>","PeriodicalId":48300,"journal":{"name":"Applied Psychological Measurement","volume":" ","pages":"01466216251322646"},"PeriodicalIF":1.0,"publicationDate":"2025-02-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11866334/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143543104","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Few and Different: Detecting Examinees With Preknowledge Using Extended Isolation Forests. 少而不同:利用扩展隔离林检测有预见性的考生。
IF 1 4区 心理学 Q4 PSYCHOLOGY, MATHEMATICAL Pub Date : 2025-02-20 DOI: 10.1177/01466216251320403
Nate R Smith, Lisa A Keller, Richard A Feinberg, Chunyan Liu

Item preknowledge refers to the case where examinees have advanced knowledge of test material prior to taking the examination. When examinees have item preknowledge, the scores that result from those item responses are not true reflections of the examinee's proficiency. Further, this contamination in the data also has an impact on the item parameter estimates and therefore has an impact on scores for all examinees, regardless of whether they had prior knowledge. To ensure the validity of test scores, it is essential to identify both issues: compromised items (CIs) and examinees with preknowledge (EWPs). In some cases, the CIs are known, and the task is reduced to determining the EWPs. However, due to the potential threat to validity, it is critical for high-stakes testing programs to have a process for routinely monitoring for evidence of EWPs, often when CIs are unknown. Further, even knowing that specific items may have been compromised does not guarantee that any examinees had prior access to those items, or that those examinees that did have prior access know how to effectively use the preknowledge. Therefore, this paper attempts to use response behavior to identify item preknowledge without knowledge of which items may or may not have been compromised. While most research in this area has relied on traditional psychometric models, we investigate the utility of an unsupervised machine learning algorithm, extended isolation forest (EIF), to detect EWPs. Similar to previous research, the response behavior being analyzed is response time (RT) and response accuracy (RA).

项目预知是指考生在参加考试前对考试材料有预先了解的情况。当考生有项目预知时,这些项目反应的得分并不能真实反映考生的熟练程度。此外,数据中的这种污染也会影响项目参数的估计,因此会影响所有考生的分数,无论他们是否有先验知识。为了确保考试成绩的有效性,有必要确定两个问题:妥协项目(ci)和有预见性的考生(ewp)。在某些情况下,ci是已知的,并且任务简化为确定ewp。然而,由于对有效性的潜在威胁,对于高风险的测试项目来说,有一个常规监测ewp证据的过程是至关重要的,通常在ci未知的情况下。此外,即使知道特定的项目可能已经被泄露,也不能保证任何考生都能事先接触到这些项目,或者那些事先接触到这些项目的考生知道如何有效地利用这些预知。因此,本文试图在不知道哪些项目可能或可能没有受到损害的情况下,使用反应行为来识别项目预知。虽然这一领域的大多数研究都依赖于传统的心理测量模型,但我们研究了一种无监督机器学习算法——扩展隔离森林(EIF)——来检测ewp的效用。与以往的研究类似,我们分析的反应行为是反应时间(RT)和反应准确性(RA)。
{"title":"Few and Different: Detecting Examinees With Preknowledge Using Extended Isolation Forests.","authors":"Nate R Smith, Lisa A Keller, Richard A Feinberg, Chunyan Liu","doi":"10.1177/01466216251320403","DOIUrl":"10.1177/01466216251320403","url":null,"abstract":"<p><p>Item preknowledge refers to the case where examinees have advanced knowledge of test material prior to taking the examination. When examinees have item preknowledge, the scores that result from those item responses are not true reflections of the examinee's proficiency. Further, this contamination in the data also has an impact on the item parameter estimates and therefore has an impact on scores for all examinees, regardless of whether they had prior knowledge. To ensure the validity of test scores, it is essential to identify both issues: compromised items (CIs) and examinees with preknowledge (EWPs). In some cases, the CIs are known, and the task is reduced to determining the EWPs. However, due to the potential threat to validity, it is critical for high-stakes testing programs to have a process for routinely monitoring for evidence of EWPs, often when CIs are unknown. Further, even knowing that specific items may have been compromised does not guarantee that any examinees had prior access to those items, or that those examinees that did have prior access know how to effectively use the preknowledge. Therefore, this paper attempts to use response behavior to identify item preknowledge without knowledge of which items may or may not have been compromised. While most research in this area has relied on traditional psychometric models, we investigate the utility of an unsupervised machine learning algorithm, extended isolation forest (EIF), to detect EWPs. Similar to previous research, the response behavior being analyzed is response time (RT) and response accuracy (RA).</p>","PeriodicalId":48300,"journal":{"name":"Applied Psychological Measurement","volume":" ","pages":"01466216251320403"},"PeriodicalIF":1.0,"publicationDate":"2025-02-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11843570/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143484553","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Applied Psychological Measurement
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1