Applied Psychological Measurement最新文献_第2页

Evaluating the Construct Validity of Instructional Manipulation Checks as Measures of Careless Responding to Surveys. 评估教学操纵检查的结构有效性，以此衡量对调查的粗心答复。

IF 1 4区心理学 Q4 PSYCHOLOGY, MATHEMATICAL

Applied Psychological Measurement

Pub Date : 2024-11-01 Epub Date: 2024-09-20 DOI: 10.1177/01466216241284293

Mark C Ramsey, Nathan A Bowling, Preston S Menke

Careless responding measures are important for several purposes, whether it's screening for careless responding or for research centered on careless responding as a substantive variable. One such approach for assessing carelessness in surveys is the use of an instructional manipulation check. Despite its apparent popularity, little is known about the construct validity of instructional manipulation checks as measures of careless responding. Initial results are inconclusive, and no study has thoroughly evaluated the validity of the instructional manipulation check as a measure of careless responding. Across 2 samples (N = 762), we evaluated the construct validity of the instructional manipulation check under a nomological network. We found that the instructional manipulation check converged poorly with other measures of careless responding, weakly predicted participant inability to recognize study content, and did not display incremental validity over existing measures of careless responding. Additional analyses revealed that instructional manipulation checks performed poorly compared to single scores of other alternative careless responding measures and that screening data with alternative measures of careless responding produced greater or similar gains in data quality to instructional manipulation checks. Based on the results of our studies, we do not recommend using instructional manipulation checks to assess or screen for careless responding to surveys.

无论是筛查粗心应答，还是将粗心应答作为一个实质性变量进行研究，粗心应答测量方法在多个方面都非常重要。在调查中评估粗心的一种方法是使用指导性操作检查。尽管这种方法显然很受欢迎，但人们对教学操纵检查作为粗心应答测量方法的构建有效性知之甚少。初步结果尚无定论，也没有研究对教学操纵检查作为粗心应答测量方法的有效性进行全面评估。我们通过 2 个样本（N = 762），评估了名义网络下教学操纵检查的建构有效性。我们发现，教学操纵检查与其他粗心应答测量的收敛性较差，对被试无法识别学习内容的预测能力较弱，而且与现有的粗心应答测量相比，没有显示出增量有效性。其他分析表明，与其他粗心应答测量方法的单项得分相比，指导性操作检查的表现很差，而用其他粗心应答测量方法筛选数据，在数据质量方面的收益要大于或类似于指导性操作检查。根据我们的研究结果，我们不建议使用教学操作检查来评估或筛查粗心应答调查。

{"title":"Evaluating the Construct Validity of Instructional Manipulation Checks as Measures of Careless Responding to Surveys.","authors":"Mark C Ramsey, Nathan A Bowling, Preston S Menke","doi":"10.1177/01466216241284293","DOIUrl":"10.1177/01466216241284293","url":null,"abstract":"Careless responding measures are important for several purposes, whether it's screening for careless responding or for research centered on careless responding as a substantive variable. One such approach for assessing carelessness in surveys is the use of an instructional manipulation check. Despite its apparent popularity, little is known about the construct validity of instructional manipulation checks as measures of careless responding. Initial results are inconclusive, and no study has thoroughly evaluated the validity of the instructional manipulation check as a measure of careless responding. Across 2 samples (N = 762), we evaluated the construct validity of the instructional manipulation check under a nomological network. We found that the instructional manipulation check converged poorly with other measures of careless responding, weakly predicted participant inability to recognize study content, and did not display incremental validity over existing measures of careless responding. Additional analyses revealed that instructional manipulation checks performed poorly compared to single scores of other alternative careless responding measures and that screening data with alternative measures of careless responding produced greater or similar gains in data quality to instructional manipulation checks. Based on the results of our studies, we do not recommend using instructional manipulation checks to assess or screen for careless responding to surveys.","PeriodicalId":48300,"journal":{"name":"Applied Psychological Measurement","volume":"48 7-8","pages":"341-356"},"PeriodicalIF":1.0,"publicationDate":"2024-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11501094/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142510499","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Estimating Test-Retest Reliability in the Presence of Self-Selection Bias and Learning/Practice Effects. 估计存在自我选择偏差和学习/练习效应时的测验-重测信度。

IF 1 4区心理学 Q4 PSYCHOLOGY, MATHEMATICAL

Applied Psychological Measurement

Pub Date : 2024-11-01 Epub Date: 2024-09-17 DOI: 10.1177/01466216241284585

William C M Belzak, J R Lockwood

Test-retest reliability is often estimated using naturally occurring data from test repeaters. In settings such as admissions testing, test takers choose if and when to retake an assessment. This self-selection can bias estimates of test-retest reliability because individuals who choose to retest are typically unrepresentative of the broader testing population and because differences among test takers in learning or practice effects may increase with time between test administrations. We develop a set of methods for estimating test-retest reliability from observational data that can mitigate these sources of bias, which include sample weighting, polynomial regression, and Bayesian model averaging. We demonstrate the value of using these methods for reducing bias and improving precision of estimated reliability using empirical and simulated data, both of which are based on more than 40,000 repeaters of a high-stakes English language proficiency test. Finally, these methods generalize to settings in which only a single, error-prone measurement is taken repeatedly over time and where self-selection and/or changes to the underlying construct may be at play.

重测信度通常是利用重测者的自然数据来估算的。在入学考试等情况下，考生会选择是否重考以及何时重考。这种自我选择可能会对重测信度的估计产生偏差，因为选择重测的人通常不能代表更广泛的测试人群，而且随着两次测试之间时间的推移，应试者之间在学习或练习效果方面的差异可能会增大。我们开发了一套从观察数据中估计重测信度的方法，可以减少这些偏差来源，其中包括样本加权、多项式回归和贝叶斯模型平均。我们使用经验数据和模拟数据（均基于一项高风险英语语言能力测试的 40,000 多名重测者）证明了使用这些方法在减少偏差和提高估计信度精度方面的价值。最后，这些方法适用于在一段时间内只重复进行单一的、易出错的测量，以及可能存在自我选择和/或基础结构变化的情况。

{"title":"Estimating Test-Retest Reliability in the Presence of Self-Selection Bias and Learning/Practice Effects.","authors":"William C M Belzak, J R Lockwood","doi":"10.1177/01466216241284585","DOIUrl":"https://doi.org/10.1177/01466216241284585","url":null,"abstract":"Test-retest reliability is often estimated using naturally occurring data from test repeaters. In settings such as admissions testing, test takers choose if and when to retake an assessment. This self-selection can bias estimates of test-retest reliability because individuals who choose to retest are typically unrepresentative of the broader testing population and because differences among test takers in learning or practice effects may increase with time between test administrations. We develop a set of methods for estimating test-retest reliability from observational data that can mitigate these sources of bias, which include sample weighting, polynomial regression, and Bayesian model averaging. We demonstrate the value of using these methods for reducing bias and improving precision of estimated reliability using empirical and simulated data, both of which are based on more than 40,000 repeaters of a high-stakes English language proficiency test. Finally, these methods generalize to settings in which only a single, error-prone measurement is taken repeatedly over time and where self-selection and/or changes to the underlying construct may be at play.","PeriodicalId":48300,"journal":{"name":"Applied Psychological Measurement","volume":"48 7-8","pages":"323-340"},"PeriodicalIF":1.0,"publicationDate":"2024-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11528726/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142569674","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A Mark-Recapture Approach to Estimating Item Pool Compromise. 估算项目池妥协的标记重捕方法。

IF 1 4区心理学 Q4 PSYCHOLOGY, MATHEMATICAL

Applied Psychological Measurement

Pub Date : 2024-11-01 Epub Date: 2024-09-13 DOI: 10.1177/01466216241284410

Richard A Feinberg

Testing organizations routinely investigate if secure exam material has been compromised and is consequently invalid for scoring and inclusion on future assessments. Beyond identifying individual compromised items, knowing the degree to which a form is compromised can inform decisions on whether the form can no longer be administered or when an item pool is compromised to such an extent that serious action on a broad scale must be taken to ensure the validity of score interpretations. Previous research on estimating the population of item compromise is sparse; however, this is a more generally long-studied problem in ecological research. In this note, we exemplify the utility of the mark-recapture technique to estimate the population of compromised items, first through a brief demonstration to introduce the fundamental concepts and then a more realistic scenario to illustrate applicability to large-scale testing programs. An effective use of this technique would be to longitudinally track changes in the estimated population to inform operational test security strategies. Many variations on mark-recapture exist and interpretation of the estimated population depends on several factors. Thus, this note is only meant to introduce the concept of mark-recapture as a useful application to evaluate a testing organization's compromise mitigation procedures.

考试机构通常会调查安全考试材料是否已被泄露，从而导致评分无效并被列入未来的评估中。除了确定个别受损项目外，了解一份表格的受损程度还能为决定是否不能再使用该表格或当一个项目库受损到必须采取大规模严肃行动以确保分数解释的有效性提供信息。以前关于估计项目受损群体的研究很少；然而，这是生态学研究中一个长期研究的普遍问题。在本说明中，我们将举例说明标记再捕捉技术在估算失分项目群体方面的实用性，首先通过一个简短的演示来介绍基本概念，然后通过一个更现实的场景来说明该技术在大规模测试项目中的适用性。这项技术的一个有效用途是纵向跟踪估计总数的变化，为操作测试安全策略提供信息。标记再捕获有许多变体，对估计群体的解释取决于多个因素。因此，本说明仅介绍标记再捕获的概念，将其作为评估测试机构漏洞缓解程序的有效应用。

{"title":"A Mark-Recapture Approach to Estimating Item Pool Compromise.","authors":"Richard A Feinberg","doi":"10.1177/01466216241284410","DOIUrl":"10.1177/01466216241284410","url":null,"abstract":"Testing organizations routinely investigate if secure exam material has been compromised and is consequently invalid for scoring and inclusion on future assessments. Beyond identifying individual compromised items, knowing the degree to which a form is compromised can inform decisions on whether the form can no longer be administered or when an item pool is compromised to such an extent that serious action on a broad scale must be taken to ensure the validity of score interpretations. Previous research on estimating the population of item compromise is sparse; however, this is a more generally long-studied problem in ecological research. In this note, we exemplify the utility of the mark-recapture technique to estimate the population of compromised items, first through a brief demonstration to introduce the fundamental concepts and then a more realistic scenario to illustrate applicability to large-scale testing programs. An effective use of this technique would be to longitudinally track changes in the estimated population to inform operational test security strategies. Many variations on mark-recapture exist and interpretation of the estimated population depends on several factors. Thus, this note is only meant to introduce the concept of mark-recapture as a useful application to evaluate a testing organization's compromise mitigation procedures.","PeriodicalId":48300,"journal":{"name":"Applied Psychological Measurement","volume":"48 7-8","pages":"357-363"},"PeriodicalIF":1.0,"publicationDate":"2024-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11528777/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142569673","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Effect of Differential Item Functioning on Computer Adaptive Testing Under Different Conditions. 不同条件下差异化项目功能对计算机自适应测试的影响

IF 1 4区心理学 Q4 PSYCHOLOGY, MATHEMATICAL

Applied Psychological Measurement

Pub Date : 2024-11-01 Epub Date: 2024-09-17 DOI: 10.1177/01466216241284295

Merve Sahin Kursad, Seher Yalcin

This study provides an overview of the effect of differential item functioning (DIF) on measurement precision, test information function (TIF), and test effectiveness in computer adaptive tests (CATs). Simulated data for the study was produced and analyzed with the Rstudio. During the data generation process, item pool size, DIF type, DIF percentage, item selection method for CAT, and the test termination rules were considered changed conditions. Sample size and ability parameter distribution, Item Response Theory (IRT) model, DIF size, ability estimation method, test starting rule, and item usage frequency method regarding CAT conditions were considered fixed conditions. To examine the effect of DIF, measurement precision, TIF and test effectiveness were calculated. Results show DIF has negative effects on measurement precision, TIF, and test effectiveness. In particular, statistically significant effects of the percentage DIF items and DIF type are observed on measurement precision.

本研究概述了差异项目功能（DIF）对计算机自适应测试（CAT）中测量精度、测试信息功能（TIF）和测试有效性的影响。本研究使用 Rstudio 生成和分析模拟数据。在数据生成过程中，项目池规模、DIF 类型、DIF 百分比、CAT 项目选择方法和测试终止规则被视为变化条件。样本量和能力参数分布、项目反应理论（IRT）模型、DIF 大小、能力估计方法、测试开始规则以及有关 CAT 条件的项目使用频率方法被视为固定条件。为了考察 DIF 的影响，计算了测量精度、TIF 和测试有效性。结果表明，DIF 对测量精度、TIF 和测验有效性有负面影响。特别是，DIF 项目百分比和 DIF 类型对测量精度的影响在统计学上具有显著性。

引用次数: 0

The Improved EMS Algorithm for Latent Variable Selection in M3PL Model. 用于 M3PL 模型中潜在变量选择的改进 EMS 算法。

IF 1 4区心理学 Q4 PSYCHOLOGY, MATHEMATICAL

Applied Psychological Measurement

Pub Date : 2024-10-21 DOI: 10.1177/01466216241291237

Laixu Shang, Ping-Feng Xu, Na Shan, Man-Lai Tang, Qian-Zhen Zheng

One of the main concerns in multidimensional item response theory (MIRT) is to detect the relationship between items and latent traits, which can be treated as a latent variable selection problem. An attractive method for latent variable selection in multidimensional 2-parameter logistic (M2PL) model is to minimize the observed Bayesian information criterion (BIC) by the expectation model selection (EMS) algorithm. The EMS algorithm extends the EM algorithm and allows the updates of the model (e.g., the loading structure in MIRT) in the iterations along with the parameters under the model. As an extension of the M2PL model, the multidimensional 3-parameter logistic (M3PL) model introduces an additional guessing parameter which makes the latent variable selection more challenging. In this paper, a well-designed EMS algorithm, named improved EMS (IEMS), is proposed to accurately and efficiently detect the underlying true loading structure in the M3PL model, which also works for the M2PL model. In simulation studies, we compare the IEMS algorithm with several state-of-art methods and the IEMS is of competitiveness in terms of model recovery, estimation precision, and computational efficiency. The IEMS algorithm is illustrated by its application to two real data sets.

多维项目反应理论（MIRT）的主要关注点之一是检测项目与潜在特质之间的关系，这可以看作是一个潜在变量选择问题。在多维双参数逻辑（M2PL）模型中，一种有吸引力的潜变量选择方法是通过期望模型选择（EMS）算法使观察到的贝叶斯信息准则（BIC）最小化。EMS 算法扩展了 EM 算法，允许在迭代中更新模型（如 MIRT 中的负载结构）和模型下的参数。作为 M2PL 模型的扩展，多维三参数逻辑（M3PL）模型引入了一个额外的猜测参数，这使得潜变量选择更具挑战性。本文提出了一种精心设计的 EMS 算法，名为改进 EMS（IEMS），用于准确有效地检测 M3PL 模型中潜在的真实负载结构，该算法同样适用于 M2PL 模型。在模拟研究中，我们将 IEMS 算法与几种最先进的方法进行了比较，IEMS 在模型恢复、估计精度和计算效率方面都具有竞争力。IEMS 算法在两个真实数据集上的应用说明了这一点。

{"title":"The Improved EMS Algorithm for Latent Variable Selection in M3PL Model.","authors":"Laixu Shang, Ping-Feng Xu, Na Shan, Man-Lai Tang, Qian-Zhen Zheng","doi":"10.1177/01466216241291237","DOIUrl":"10.1177/01466216241291237","url":null,"abstract":"One of the main concerns in multidimensional item response theory (MIRT) is to detect the relationship between items and latent traits, which can be treated as a latent variable selection problem. An attractive method for latent variable selection in multidimensional 2-parameter logistic (M2PL) model is to minimize the observed Bayesian information criterion (BIC) by the expectation model selection (EMS) algorithm. The EMS algorithm extends the EM algorithm and allows the updates of the model (e.g., the loading structure in MIRT) in the iterations along with the parameters under the model. As an extension of the M2PL model, the multidimensional 3-parameter logistic (M3PL) model introduces an additional guessing parameter which makes the latent variable selection more challenging. In this paper, a well-designed EMS algorithm, named improved EMS (IEMS), is proposed to accurately and efficiently detect the underlying true loading structure in the M3PL model, which also works for the M2PL model. In simulation studies, we compare the IEMS algorithm with several state-of-art methods and the IEMS is of competitiveness in terms of model recovery, estimation precision, and computational efficiency. The IEMS algorithm is illustrated by its application to two real data sets.","PeriodicalId":48300,"journal":{"name":"Applied Psychological Measurement","volume":" ","pages":"01466216241291237"},"PeriodicalIF":1.0,"publicationDate":"2024-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11559968/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142630392","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Optimal Test Design for Estimation of Mean Ability Growth. 估计平均能力增长的最佳测试设计。

IF 1 4区心理学 Q4 PSYCHOLOGY, MATHEMATICAL

Applied Psychological Measurement

Pub Date : 2024-10-15 DOI: 10.1177/01466216241291233

Jonas Bjermo

The design of an achievement test is crucial for many reasons. This article focuses on a population's ability growth between school grades. We define design as the allocating of test items concerning the difficulties. The objective is to present an optimal test design method for estimating the mean and percentile ability growth with good precision. We use the asymptotic expression of the variance in terms of the test information. With that criterion for optimization, we propose to use particle swarm optimization to find the optimal design. The results show that the allocation of the item difficulties depends on item discrimination and the magnitude of the ability growth. The optimization function depends on the examinees' abilities, hence, the value of the unknown mean ability growth. Therefore, we will also use an optimum in-average design and conclude that it is robust to uncertainty in the mean ability growth. A test is, in practice, assembled from items stored in an item pool with calibrated item parameters. Hence, we also perform a discrete optimization using simulated annealing and compare the results to the particle swarm optimization.

出于多种原因，成绩测验的设计至关重要。本文的重点是研究学生在不同年级之间的能力增长情况。我们将设计定义为测试项目难度的分配。我们的目标是提出一种最佳的测验设计方法，以精确地估计平均值和百分位数的能力增长。我们使用测试信息方差的渐近表达式。根据这一优化标准，我们建议使用粒子群优化来找到最优设计。结果表明，题目难度的分配取决于题目的区分度和能力增长的幅度。优化函数取决于考生的能力，因此也取决于未知的平均能力增长值。因此，我们也将使用平均值内最优设计，并得出结论：它对平均能力增长的不确定性具有稳健性。在实践中，测试是由存储在项目库中的项目和经过校准的项目参数组合而成的。因此，我们还将使用模拟退火进行离散优化，并将结果与粒子群优化进行比较。

{"title":"Optimal Test Design for Estimation of Mean Ability Growth.","authors":"Jonas Bjermo","doi":"10.1177/01466216241291233","DOIUrl":"10.1177/01466216241291233","url":null,"abstract":"The design of an achievement test is crucial for many reasons. This article focuses on a population's ability growth between school grades. We define design as the allocating of test items concerning the difficulties. The objective is to present an optimal test design method for estimating the mean and percentile ability growth with good precision. We use the asymptotic expression of the variance in terms of the test information. With that criterion for optimization, we propose to use particle swarm optimization to find the optimal design. The results show that the allocation of the item difficulties depends on item discrimination and the magnitude of the ability growth. The optimization function depends on the examinees' abilities, hence, the value of the unknown mean ability growth. Therefore, we will also use an optimum in-average design and conclude that it is robust to uncertainty in the mean ability growth. A test is, in practice, assembled from items stored in an item pool with calibrated item parameters. Hence, we also perform a discrete optimization using simulated annealing and compare the results to the particle swarm optimization.","PeriodicalId":48300,"journal":{"name":"Applied Psychological Measurement","volume":" ","pages":"01466216241291233"},"PeriodicalIF":1.0,"publicationDate":"2024-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11560061/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142630381","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A Two-Step Q-Matrix Estimation Method. 两步 Q 矩阵估算法

IF 1 4区心理学 Q4 PSYCHOLOGY, MATHEMATICAL

Applied Psychological Measurement

Pub Date : 2024-10-10 DOI: 10.1177/01466216241284418

Hans-Friedrich Köhn, Chia-Yi Chiu, Olasumbo Oluwalana, Hyunjoo Kim, Jiaxi Wang

Cognitive Diagnosis Models in educational measurement are restricted latent class models that describe ability in a knowledge domain as a composite of latent skills an examinee may have mastered or failed. Different combinations of skills define distinct latent proficiency classes to which examinees are assigned based on test performance. Items of cognitively diagnostic assessments are characterized by skill profiles specifying which skills are required for a correct item response. The item-skill profiles of a test form its Q-matrix. The validity of cognitive diagnosis depends crucially on the correct specification of the Q-matrix. Typically, Q-matrices are determined by curricular experts. However, expert judgment is fallible. Data-driven estimation methods have been developed with the promise of greater accuracy in identifying the Q-matrix of a test. Yet, many of the extant methods encounter computational feasibility issues either in the form of excessive amounts of CPU times or inadmissible estimates. In this article, a two-step algorithm for estimating the Q-matrix is proposed that can be used with any cognitive diagnosis model. Simulations showed that the new method outperformed extant estimation algorithms and was computationally more efficient. It was also applied to Tatsuoka's famous fraction-subtraction data. The paper concludes with a discussion of theoretical and practical implications of the findings.

教育测量中的认知诊断模型是一种有限制的潜类模型，它将某一知识领域的能力描述为受测者可能已掌握或未掌握的潜技能的组合。不同的技能组合定义了不同的潜在能力等级，受测者根据测试成绩被分配到不同的等级。认知诊断式测评的项目是以技能描述为特征的，具体说明正确的项目回答需要哪些技能。测验的项目-技能特征构成了测验的 Q 矩阵。认知诊断的有效性在很大程度上取决于 Q 矩阵的规格是否正确。Q 矩阵通常由课程专家确定。然而，专家的判断是不可靠的。数据驱动的估算方法应运而生，有望更准确地确定测试的 Q 矩阵。然而，许多现存方法都遇到了计算可行性问题，要么需要耗费过多的 CPU 时间，要么估算结果不可接受。本文提出了一种估算 Q 矩阵的两步算法，可用于任何认知诊断模型。模拟结果表明，新方法的性能优于现有的估计算法，而且计算效率更高。该方法还被应用于 Tatsuoka 著名的分数减法数据。论文最后讨论了研究结果的理论和实践意义。

{"title":"A Two-Step Q-Matrix Estimation Method.","authors":"Hans-Friedrich Köhn, Chia-Yi Chiu, Olasumbo Oluwalana, Hyunjoo Kim, Jiaxi Wang","doi":"10.1177/01466216241284418","DOIUrl":"10.1177/01466216241284418","url":null,"abstract":"Cognitive Diagnosis Models in educational measurement are restricted latent class models that describe ability in a knowledge domain as a composite of latent skills an examinee may have mastered or failed. Different combinations of skills define distinct latent proficiency classes to which examinees are assigned based on test performance. Items of cognitively diagnostic assessments are characterized by skill profiles specifying which skills are required for a correct item response. The item-skill profiles of a test form its Q-matrix. The validity of cognitive diagnosis depends crucially on the correct specification of the Q-matrix. Typically, Q-matrices are determined by curricular experts. However, expert judgment is fallible. Data-driven estimation methods have been developed with the promise of greater accuracy in identifying the Q-matrix of a test. Yet, many of the extant methods encounter computational feasibility issues either in the form of excessive amounts of CPU times or inadmissible estimates. In this article, a two-step algorithm for estimating the Q-matrix is proposed that can be used with any cognitive diagnosis model. Simulations showed that the new method outperformed extant estimation algorithms and was computationally more efficient. It was also applied to Tatsuoka's famous fraction-subtraction data. The paper concludes with a discussion of theoretical and practical implications of the findings.","PeriodicalId":48300,"journal":{"name":"Applied Psychological Measurement","volume":" ","pages":"01466216241284418"},"PeriodicalIF":1.0,"publicationDate":"2024-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11560062/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142630379","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Item Response Modeling of Clinical Instruments With Filter Questions: Disentangling Symptom Presence and Severity. 带有过滤器问题的临床工具的项目反应模型：区分症状的存在与严重程度

IF 1 4区心理学 Q4 PSYCHOLOGY, MATHEMATICAL

Applied Psychological Measurement

Pub Date : 2024-09-01 Epub Date: 2024-06-17 DOI: 10.1177/01466216241261709

Brooke E Magnus

Clinical instruments that use a filter/follow-up response format often produce data with excess zeros, especially when administered to nonclinical samples. When the unidimensional graded response model (GRM) is then fit to these data, parameter estimates and scale scores tend to suggest that the instrument measures individual differences only among individuals with severe levels of the psychopathology. In such scenarios, alternative item response models that explicitly account for excess zeros may be more appropriate. The multivariate hurdle graded response model (MH-GRM), which has been previously proposed for handling zero-inflated questionnaire data, includes two latent variables: susceptibility, which underlies responses to the filter question, and severity, which underlies responses to the follow-up question. Using both simulated and empirical data, the current research shows that compared to unidimensional GRMs, the MH-GRM is better able to capture individual differences across a wider range of psychopathology, and that when unidimensional GRMs are fit to data from questionnaires that include filter questions, individual differences at the lower end of the severity continuum largely go unmeasured. Practical implications are discussed.

使用筛选/追踪反应格式的临床工具通常会产生过多零的数据，尤其是在对非临床样本进行施测时。如果将单维分级反应模型（GRM）与这些数据进行拟合，参数估计和量表得分往往表明，该工具只能测量具有严重心理病理学水平的个体之间的个体差异。在这种情况下，明确考虑多余零的替代项目反应模型可能更合适。多变量障碍分级反应模型（MH-GRM）是之前为处理零膨胀问卷数据而提出的，它包括两个潜变量：易感性和严重性，前者是对筛选问题的反应的基础，后者是对后续问题的反应的基础。通过模拟数据和经验数据，目前的研究表明，与单维 GRM 相比，MH-GRM 能够更好地捕捉更广泛的精神病理学中的个体差异，而且当单维 GRM 与包含过滤问题的问卷数据相匹配时，严重程度连续体低端的个体差异在很大程度上得不到测量。本文讨论了其实际意义。

{"title":"Item Response Modeling of Clinical Instruments With Filter Questions: Disentangling Symptom Presence and Severity.","authors":"Brooke E Magnus","doi":"10.1177/01466216241261709","DOIUrl":"10.1177/01466216241261709","url":null,"abstract":"Clinical instruments that use a filter/follow-up response format often produce data with excess zeros, especially when administered to nonclinical samples. When the unidimensional graded response model (GRM) is then fit to these data, parameter estimates and scale scores tend to suggest that the instrument measures individual differences only among individuals with severe levels of the psychopathology. In such scenarios, alternative item response models that explicitly account for excess zeros may be more appropriate. The multivariate hurdle graded response model (MH-GRM), which has been previously proposed for handling zero-inflated questionnaire data, includes two latent variables: susceptibility, which underlies responses to the filter question, and severity, which underlies responses to the follow-up question. Using both simulated and empirical data, the current research shows that compared to unidimensional GRMs, the MH-GRM is better able to capture individual differences across a wider range of psychopathology, and that when unidimensional GRMs are fit to data from questionnaires that include filter questions, individual differences at the lower end of the severity continuum largely go unmeasured. Practical implications are discussed.","PeriodicalId":48300,"journal":{"name":"Applied Psychological Measurement","volume":"48 6","pages":"235-256"},"PeriodicalIF":1.0,"publicationDate":"2024-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11331747/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142009739","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Using Auxiliary Item Information in the Item Parameter Estimation of a Graded Response Model for a Small to Medium Sample Size: Empirical Versus Hierarchical Bayes Estimation 在小到中等样本量的分级反应模型的项目参数估计中使用辅助项目信息:经验与层次贝叶斯估计

4区心理学 Q4 PSYCHOLOGY, MATHEMATICAL

Applied Psychological Measurement

Pub Date : 2023-11-03 DOI: 10.1177/01466216231209758

Matthew Naveiras, Sun-Joo Cho

Marginal maximum likelihood estimation (MMLE) is commonly used for item response theory item parameter estimation. However, sufficiently large sample sizes are not always possible when studying rare populations. In this paper, empirical Bayes and hierarchical Bayes are presented as alternatives to MMLE in small sample sizes, using auxiliary item information to estimate the item parameters of a graded response model with higher accuracy. Empirical Bayes and hierarchical Bayes methods are compared with MMLE to determine under what conditions these Bayes methods can outperform MMLE, and to determine if hierarchical Bayes can act as an acceptable alternative to MMLE in conditions where MMLE is unable to converge. In addition, empirical Bayes and hierarchical Bayes methods are compared to show how hierarchical Bayes can result in estimates of posterior variance with greater accuracy than empirical Bayes by acknowledging the uncertainty of item parameter estimates. The proposed methods were evaluated via a simulation study. Simulation results showed that hierarchical Bayes methods can be acceptable alternatives to MMLE under various testing conditions, and we provide a guideline to indicate which methods would be recommended in different research situations. R functions are provided to implement these proposed methods.

边际最大似然估计是项目反应理论中常用的项目参数估计方法。然而，在研究稀有种群时，足够大的样本量并不总是可能的。本文提出了经验贝叶斯和层次贝叶斯作为小样本量MMLE的替代方法，利用辅助的项目信息来估计分级响应模型的项目参数，具有更高的精度。将经验贝叶斯和层次贝叶斯方法与MMLE进行比较，以确定在哪些条件下这些贝叶斯方法可以优于MMLE，并确定在MMLE无法收敛的情况下，层次贝叶斯是否可以作为MMLE的可接受替代方法。此外，比较了经验贝叶斯和层次贝叶斯方法，显示了层次贝叶斯如何通过承认项目参数估计的不确定性，以比经验贝叶斯更高的精度估计后验方差。通过仿真研究对所提出的方法进行了评估。仿真结果表明，在各种测试条件下，分层贝叶斯方法都是MMLE的可接受替代方法，并给出了在不同研究情况下推荐哪种方法的指导方针。提供R函数来实现这些建议的方法。

{"title":"Using Auxiliary Item Information in the Item Parameter Estimation of a Graded Response Model for a Small to Medium Sample Size: Empirical Versus Hierarchical Bayes Estimation","authors":"Matthew Naveiras, Sun-Joo Cho","doi":"10.1177/01466216231209758","DOIUrl":"https://doi.org/10.1177/01466216231209758","url":null,"abstract":"Marginal maximum likelihood estimation (MMLE) is commonly used for item response theory item parameter estimation. However, sufficiently large sample sizes are not always possible when studying rare populations. In this paper, empirical Bayes and hierarchical Bayes are presented as alternatives to MMLE in small sample sizes, using auxiliary item information to estimate the item parameters of a graded response model with higher accuracy. Empirical Bayes and hierarchical Bayes methods are compared with MMLE to determine under what conditions these Bayes methods can outperform MMLE, and to determine if hierarchical Bayes can act as an acceptable alternative to MMLE in conditions where MMLE is unable to converge. In addition, empirical Bayes and hierarchical Bayes methods are compared to show how hierarchical Bayes can result in estimates of posterior variance with greater accuracy than empirical Bayes by acknowledging the uncertainty of item parameter estimates. The proposed methods were evaluated via a simulation study. Simulation results showed that hierarchical Bayes methods can be acceptable alternatives to MMLE under various testing conditions, and we provide a guideline to indicate which methods would be recommended in different research situations. R functions are provided to implement these proposed methods.","PeriodicalId":48300,"journal":{"name":"Applied Psychological Measurement","volume":"21 6","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-11-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135819514","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

A Bayesian Random Weights Linear Logistic Test Model for Within-Test Practice Effects 测试内实践效果的贝叶斯随机权重线性Logistic检验模型

4区心理学 Q4 PSYCHOLOGY, MATHEMATICAL

Applied Psychological Measurement

Pub Date : 2023-11-01 DOI: 10.1177/01466216231209752

José H. Lozano, Javier Revuelta

The present paper introduces a random weights linear logistic test model for the measurement of individual differences in operation-specific practice effects within a single administration of a test. The proposed model is an extension of the linear logistic test model of learning developed by Spada (1977) in which the practice effects are considered random effects varying across examinees. A Bayesian framework was used for model estimation and evaluation. A simulation study was conducted to examine the behavior of the model in combination with the Bayesian procedures. The results demonstrated the good performance of the estimation and evaluation methods. Additionally, an empirical study was conducted to illustrate the applicability of the model to real data. The model was applied to a sample of responses from a logical ability test providing evidence of individual differences in operation-specific practice effects.

本文介绍了一个随机权重线性逻辑检验模型，用于测量单个管理测试中特定操作实践效果的个体差异。所提出的模型是对Spada(1977)提出的学习线性逻辑检验模型的扩展，在Spada(1977)的模型中，实践效应被认为是随机效应，在考生之间是不同的。采用贝叶斯框架对模型进行估计和评价。结合贝叶斯过程进行了模拟研究，以检验模型的行为。结果表明，该估计和评价方法具有良好的性能。此外，通过实证研究验证了该模型对实际数据的适用性。该模型应用于逻辑能力测试的反应样本，提供了具体操作实践效果的个体差异的证据。

引用次数: 0