首页 > 最新文献

Applied Psychological Measurement最新文献

英文 中文
Maximum Marginal Likelihood Estimation of the MUPP-GGUM Model. mpup - ggum模型的最大边际似然估计。
IF 1 4区 心理学 Q4 PSYCHOLOGY, MATHEMATICAL Pub Date : 2025-04-19 DOI: 10.1177/01466216251336925
Jianbin Fu
{"title":"Maximum Marginal Likelihood Estimation of the MUPP-GGUM Model.","authors":"Jianbin Fu","doi":"10.1177/01466216251336925","DOIUrl":"https://doi.org/10.1177/01466216251336925","url":null,"abstract":"","PeriodicalId":48300,"journal":{"name":"Applied Psychological Measurement","volume":" ","pages":"01466216251336925"},"PeriodicalIF":1.0,"publicationDate":"2025-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12009269/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143990880","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Understanding Rater Cognition in Performance Assessment: A Mixed IRTree Approach. 理解绩效评估中的评分者认知:一种混合IRTree方法。
IF 1 4区 心理学 Q4 PSYCHOLOGY, MATHEMATICAL Pub Date : 2025-04-14 DOI: 10.1177/01466216251333578
Hung-Yu Huang

When rater-mediated assessments are conducted, human raters often appraise the performance of ratees. However, challenges arise regarding the validity of raters' judgments in reflecting ratees' competencies according to scoring rubrics. Research on rater cognition suggests that both impersonal judgments and personal preferences can influence raters' judgmental processes. This study introduces a mixed IRTree-based model for rater judgments (MIM-R), which identifies professional and novice raters by sequentially applying the ideal-point and dominance item response theory (IRT) models to the cognitive process of raters. The simulation results demonstrate a satisfactory recovery of MIM-R parameters and highlight the importance of considering the mixed nature of raters in the rating process, as neglecting this leads to more biased estimations with an increasing proportion of novice raters. An empirical example of a creativity assessment is presented to illustrate the application and implications of MIM-R.

当进行评级中介评估时,人类评级员通常评估利率的表现。然而,根据评分标准,评分者在反映评分者能力方面的判断的有效性方面出现了挑战。对评分者认知的研究表明,客观判断和个人偏好都会影响评分者的判断过程。本研究提出了一种基于irtree的评判员判断混合模型(MIM-R),将理想点和优势项反应理论(IRT)模型依次应用于评判员的认知过程,从而识别出专业和新手评判员。仿真结果表明,MIM-R参数的恢复令人满意,并强调了在评分过程中考虑评分者的混合性质的重要性,因为忽略这一点会导致随着新手评分者比例的增加而产生更多的偏差估计。本文以创造力评估为例,阐述了MIM-R的应用及其意义。
{"title":"Understanding Rater Cognition in Performance Assessment: A Mixed IRTree Approach.","authors":"Hung-Yu Huang","doi":"10.1177/01466216251333578","DOIUrl":"https://doi.org/10.1177/01466216251333578","url":null,"abstract":"<p><p>When rater-mediated assessments are conducted, human raters often appraise the performance of ratees. However, challenges arise regarding the validity of raters' judgments in reflecting ratees' competencies according to scoring rubrics. Research on rater cognition suggests that both impersonal judgments and personal preferences can influence raters' judgmental processes. This study introduces a mixed IRTree-based model for rater judgments (MIM-R), which identifies professional and novice raters by sequentially applying the ideal-point and dominance item response theory (IRT) models to the cognitive process of raters. The simulation results demonstrate a satisfactory recovery of MIM-R parameters and highlight the importance of considering the mixed nature of raters in the rating process, as neglecting this leads to more biased estimations with an increasing proportion of novice raters. An empirical example of a creativity assessment is presented to illustrate the application and implications of MIM-R.</p>","PeriodicalId":48300,"journal":{"name":"Applied Psychological Measurement","volume":" ","pages":"01466216251333578"},"PeriodicalIF":1.0,"publicationDate":"2025-04-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11996833/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144052094","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Accuracy in Invariance Detection With Multilevel Models With Three Estimators. 具有三个估计量的多水平模型的不变性检测精度。
IF 1 4区 心理学 Q4 PSYCHOLOGY, MATHEMATICAL Pub Date : 2025-03-24 DOI: 10.1177/01466216251325644
W Holmes Finch, Cihan Demir, Brian F French, Thao Vo

Applied and simulation studies document model convergence and accuracy issues in differential item functioning detection with multilevel models, hindering detection. This study aimed to evaluate the effectiveness of various estimation techniques in addressing these issues and ensure robust DIF detection. We conducted a simulation study to investigate the performance of multilevel logistic regression models with predictors at level 2 across different estimation procedures, including maximum likelihood estimation (MLE), Bayesian estimation, and generalized estimating equations (GEE). The simulation results demonstrated that all maintained control over the Type I error rate across conditions. In most cases, GEE had comparable or higher power compared to MLE for identifying DIF, with Bayes having the lowest power. When potentially important covariates at levels-1 and 2 were included in the model, power for all methods was higher. These results suggest that in many cases where multilevel logistic regression is used for DIF detection, GEE offers a viable option for researchers and that including important contextual variables at all levels of the data is desirable. Implications for practice are discussed.

应用与仿真研究了多层次模型在差异项目功能检测中的收敛性和准确性问题。本研究旨在评估各种估计技术在解决这些问题和确保鲁棒DIF检测方面的有效性。我们进行了一项模拟研究,以研究具有2级预测因子的多水平逻辑回归模型在不同估计过程中的性能,包括最大似然估计(MLE)、贝叶斯估计和广义估计方程(GEE)。仿真结果表明,在各种条件下都能保持对I型错误率的控制。在大多数情况下,与MLE相比,GEE识别DIF的能力相当或更高,而贝叶斯的能力最低。当模型中包含level -1和level - 2的潜在重要协变量时,所有方法的有效性都更高。这些结果表明,在许多使用多水平逻辑回归进行DIF检测的情况下,GEE为研究人员提供了一个可行的选择,并且在所有水平的数据中包括重要的上下文变量是可取的。讨论了对实践的启示。
{"title":"Accuracy in Invariance Detection With Multilevel Models With Three Estimators.","authors":"W Holmes Finch, Cihan Demir, Brian F French, Thao Vo","doi":"10.1177/01466216251325644","DOIUrl":"10.1177/01466216251325644","url":null,"abstract":"<p><p>Applied and simulation studies document model convergence and accuracy issues in differential item functioning detection with multilevel models, hindering detection. This study aimed to evaluate the effectiveness of various estimation techniques in addressing these issues and ensure robust DIF detection. We conducted a simulation study to investigate the performance of multilevel logistic regression models with predictors at level 2 across different estimation procedures, including maximum likelihood estimation (MLE), Bayesian estimation, and generalized estimating equations (GEE). The simulation results demonstrated that all maintained control over the Type I error rate across conditions. In most cases, GEE had comparable or higher power compared to MLE for identifying DIF, with Bayes having the lowest power. When potentially important covariates at levels-1 and 2 were included in the model, power for all methods was higher. These results suggest that in many cases where multilevel logistic regression is used for DIF detection, GEE offers a viable option for researchers and that including important contextual variables at all levels of the data is desirable. Implications for practice are discussed.</p>","PeriodicalId":48300,"journal":{"name":"Applied Psychological Measurement","volume":" ","pages":"01466216251325644"},"PeriodicalIF":1.0,"publicationDate":"2025-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11948245/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143755115","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Calculating Bias in Test Score Equating in a NEAT Design. 在NEAT设计中计算考试分数等同中的偏差。
IF 1 4区 心理学 Q4 PSYCHOLOGY, MATHEMATICAL Pub Date : 2025-03-24 DOI: 10.1177/01466216251330305
Marie Wiberg, Inga Laukaityte

Test score equating is used to make scores from different test forms comparable, even when groups differ in ability. In practice, the non-equivalent group with anchor test (NEAT) design is commonly used. The overall aim was to compare the amount of bias under different conditions when using either chained equating or frequency estimation with five different criterion functions: the identity function, linear equating, equipercentile, chained equating and frequency estimation. We used real test data from a multiple-choice binary scored college admissions test to illustrate that the choice of criterion function matter. Further, we simulated data in line with the empirical data to examine difference in ability between groups, difference in item difficulty, difference in anchor test form and regular test form length, difference in correlations between anchor test form and regular test forms, and different sample size. The results indicate that how bias is defined heavily affects the conclusions we draw about which equating method is to be preferred in different scenarios. Practical implications of this in standardized tests are given together with recommendations on how to calculate bias when evaluating equating transformations.

考试成绩相等是用来使不同考试形式的分数具有可比性,即使小组的能力不同。在实践中,通常采用非等效群锚试验(NEAT)设计。总体目标是比较不同条件下的偏差量,当使用链式方程或频率估计与五个不同的标准函数:恒等函数,线性方程,等百分位,链式方程和频率估计。我们使用了大学录取考试中多项选择题的真实测试数据来说明标准函数的选择很重要。进一步,我们根据实证数据模拟数据,检验各组能力差异、项目难度差异、锚试题与常规试题长度差异、锚试题与常规试题相关性差异以及不同样本量。结果表明,如何定义偏差在很大程度上影响了我们得出的结论,即在不同情况下哪种等效方法是首选的。这在标准化测试中的实际影响给出了关于如何计算偏差时,评估等效转换的建议。
{"title":"Calculating Bias in Test Score Equating in a NEAT Design.","authors":"Marie Wiberg, Inga Laukaityte","doi":"10.1177/01466216251330305","DOIUrl":"10.1177/01466216251330305","url":null,"abstract":"<p><p>Test score equating is used to make scores from different test forms comparable, even when groups differ in ability. In practice, the non-equivalent group with anchor test (NEAT) design is commonly used. The overall aim was to compare the amount of bias under different conditions when using either chained equating or frequency estimation with five different criterion functions: the identity function, linear equating, equipercentile, chained equating and frequency estimation. We used real test data from a multiple-choice binary scored college admissions test to illustrate that the choice of criterion function matter. Further, we simulated data in line with the empirical data to examine difference in ability between groups, difference in item difficulty, difference in anchor test form and regular test form length, difference in correlations between anchor test form and regular test forms, and different sample size. The results indicate that how bias is defined heavily affects the conclusions we draw about which equating method is to be preferred in different scenarios. Practical implications of this in standardized tests are given together with recommendations on how to calculate bias when evaluating equating transformations.</p>","PeriodicalId":48300,"journal":{"name":"Applied Psychological Measurement","volume":" ","pages":"01466216251330305"},"PeriodicalIF":1.0,"publicationDate":"2025-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11948241/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143755122","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
On a Reparameterization of the MC-DINA Model. MC-DINA模型的一种再参数化。
IF 1 4区 心理学 Q4 PSYCHOLOGY, MATHEMATICAL Pub Date : 2025-03-11 DOI: 10.1177/01466216251324938
Lawrence T DeCarlo

The MC-DINA model is a cognitive diagnosis model (CDM) for multiple-choice items that was introduced by de la Torre (2009). The model extends the usual CDM in two basic ways: it allows for nominal responses instead of only dichotomous responses, and it allows skills to affect not only the choice of the correct response but also the choice of distractors. Here it is shown that the model can be re-expressed as a multinomial logit model with latent discrete predictors, that is, as a multinomial mixture model; a signal detection-like parameterization is also used. The reparameterization clarifies details about the structure and assumptions of the model, especially with respect to distractors, and helps to reveal parameter restrictions, which in turn have implications for psychological interpretations of the data and for issues with respect to statistical estimation. The approach suggests parsimonious models that are useful for practical applications, particularly for small sample sizes. The restrictions are shown to appear for items from the TIMSS 2007 fourth grade exam.

MC-DINA模型是de la Torre(2009)提出的多选题认知诊断模型(CDM)。该模型以两种基本方式扩展了通常的CDM:它允许名义反应,而不仅仅是二分反应;它允许技能不仅影响正确反应的选择,还影响干扰因素的选择。结果表明,该模型可以重新表示为具有潜在离散预测因子的多项logit模型,即多项混合模型;还使用了类似信号检测的参数化。重新参数化澄清了关于模型结构和假设的细节,特别是关于干扰因素,并有助于揭示参数限制,这反过来又对数据的心理解释和统计估计方面的问题产生影响。这种方法提出了对实际应用有用的精简模型,特别是对小样本量。这些限制出现在TIMSS 2007年四年级考试的项目中。
{"title":"On a Reparameterization of the MC-DINA Model.","authors":"Lawrence T DeCarlo","doi":"10.1177/01466216251324938","DOIUrl":"10.1177/01466216251324938","url":null,"abstract":"<p><p>The MC-DINA model is a cognitive diagnosis model (CDM) for multiple-choice items that was introduced by de la Torre (2009). The model extends the usual CDM in two basic ways: it allows for nominal responses instead of only dichotomous responses, and it allows skills to affect not only the choice of the correct response but also the choice of distractors. Here it is shown that the model can be re-expressed as a multinomial logit model with latent discrete predictors, that is, as a multinomial mixture model; a signal detection-like parameterization is also used. The reparameterization clarifies details about the structure and assumptions of the model, especially with respect to distractors, and helps to reveal parameter restrictions, which in turn have implications for psychological interpretations of the data and for issues with respect to statistical estimation. The approach suggests parsimonious models that are useful for practical applications, particularly for small sample sizes. The restrictions are shown to appear for items from the TIMSS 2007 fourth grade exam.</p>","PeriodicalId":48300,"journal":{"name":"Applied Psychological Measurement","volume":" ","pages":"01466216251324938"},"PeriodicalIF":1.0,"publicationDate":"2025-03-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11897991/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143626591","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Modeling Within- and Between-Person Differences in the Use of the Middle Category in Likert Scales. 李克特量表中中间类别使用的人内与人间差异建模。
IF 1 4区 心理学 Q4 PSYCHOLOGY, MATHEMATICAL Pub Date : 2025-03-02 DOI: 10.1177/01466216251322285
Jesper Tijmstra, Maria Bolsinova

When using Likert scales, the inclusion of a middle-category response option poses a challenge for the valid measurement of the psychological attribute of interest. While this middle category is often included to provide respondents with a neutral response option, respondents may in practice also select this category when they do not want to or cannot give an informative response. If one analyzes the response data without considering these two possible uses of the middle response category, measurement may be confounded. In this paper, we propose a response-mixture IRTree model for the analysis of Likert-scale data. This model acknowledges that the middle response category can either be selected as a non-response option (and hence be uninformative for the attribute of interest) or to communicate a neutral position (and hence be informative), and that this choice depends on both person- and item-characteristics. For each observed middle-category response, the probability that it was intended to be informative is modeled, and both the attribute of substantive interest and a non-response tendency are estimated. The performance of the model is evaluated in a simulation study, and the procedure is applied to empirical data from personality psychology.

当使用李克特量表时,包含一个中间类别的反应选项对感兴趣的心理属性的有效测量提出了挑战。虽然这一中间类别通常是为了给受访者提供一个中立的回答选项,但受访者在实践中也可能在他们不想或不能给出信息性回答时选择这一类别。如果在分析响应数据时不考虑中间响应类别的这两种可能用途,测量可能会混淆。本文提出了一种响应混合IRTree模型,用于李克特尺度数据的分析。该模型承认,中间反应类别既可以被选择为非反应选项(因此对感兴趣的属性没有信息),也可以被选择为传达中立立场(因此有信息),并且这种选择取决于个人和项目的特征。对于每一个观察到的中间类别反应,它被用来提供信息的概率被建模,并估计实质性兴趣属性和非反应倾向。在仿真研究中对模型的性能进行了评估,并将该方法应用于人格心理学的经验数据。
{"title":"Modeling Within- and Between-Person Differences in the Use of the Middle Category in Likert Scales.","authors":"Jesper Tijmstra, Maria Bolsinova","doi":"10.1177/01466216251322285","DOIUrl":"10.1177/01466216251322285","url":null,"abstract":"<p><p>When using Likert scales, the inclusion of a middle-category response option poses a challenge for the valid measurement of the psychological attribute of interest. While this middle category is often included to provide respondents with a neutral response option, respondents may in practice also select this category when they do not want to or cannot give an informative response. If one analyzes the response data without considering these two possible uses of the middle response category, measurement may be confounded. In this paper, we propose a response-mixture IRTree model for the analysis of Likert-scale data. This model acknowledges that the middle response category can either be selected as a non-response option (and hence be uninformative for the attribute of interest) or to communicate a neutral position (and hence be informative), and that this choice depends on both person- and item-characteristics. For each observed middle-category response, the probability that it was intended to be informative is modeled, and both the attribute of substantive interest and a non-response tendency are estimated. The performance of the model is evaluated in a simulation study, and the procedure is applied to empirical data from personality psychology.</p>","PeriodicalId":48300,"journal":{"name":"Applied Psychological Measurement","volume":" ","pages":"01466216251322285"},"PeriodicalIF":1.0,"publicationDate":"2025-03-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11873858/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143558444","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Weighted Answer Similarity Analysis. 加权答案相似度分析。
IF 1 4区 心理学 Q4 PSYCHOLOGY, MATHEMATICAL Pub Date : 2025-03-01 DOI: 10.1177/01466216251322353
Nicholas Trout, Kylie Gorney

Romero et al. (2015; see also Wollack, 1997) developed the ω statistic as a method for detecting unusually similar answers between pairs of examinees. For each pair, the ω statistic considers whether the observed number of similar answers is significantly larger than the expected number of similar answers. However, one limitation of ω is that it does not account for the particular items on which similar answers are observed. Therefore, in this study, we propose a weighted version of the ω statistic that takes this information into account. We compare the performance of the new and existing statistics using detailed simulations in which several factors are manipulated. Results show that while both the new and existing statistics are able to control the Type I error rate, the new statistic is more powerful, on average.

Romero et al. (2015;另见Wollack, 1997)发展了ω统计作为一种方法来检测异常相似的答案对考生之间。对于每一对,ω统计量考虑观察到的相似答案的数量是否显著大于预期的相似答案的数量。然而,ω的一个限制是,它没有考虑到观察到类似答案的特定项目。因此,在本研究中,我们提出了一个考虑到这些信息的ω统计量的加权版本。我们使用几个因素被操纵的详细模拟来比较新的和现有的统计数据的性能。结果表明,虽然新的和现有的统计量都能够控制第一类错误率,但平均而言,新的统计量更强大。
{"title":"Weighted Answer Similarity Analysis.","authors":"Nicholas Trout, Kylie Gorney","doi":"10.1177/01466216251322353","DOIUrl":"10.1177/01466216251322353","url":null,"abstract":"<p><p>Romero et al. (2015; see also Wollack, 1997) developed the <i>ω</i> statistic as a method for detecting unusually similar answers between pairs of examinees. For each pair, the <i>ω</i> statistic considers whether the observed number of similar answers is significantly larger than the expected number of similar answers. However, one limitation of <i>ω</i> is that it does not account for the particular items on which similar answers are observed. Therefore, in this study, we propose a weighted version of the <i>ω</i> statistic that takes this information into account. We compare the performance of the new and existing statistics using detailed simulations in which several factors are manipulated. Results show that while both the new and existing statistics are able to control the Type I error rate, the new statistic is more powerful, on average.</p>","PeriodicalId":48300,"journal":{"name":"Applied Psychological Measurement","volume":" ","pages":"01466216251322353"},"PeriodicalIF":1.0,"publicationDate":"2025-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11873304/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143558445","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Optimal Test Design for Estimation of Mean Ability Growth. 估计平均能力增长的最佳测试设计。
IF 1.2 4区 心理学 Q4 PSYCHOLOGY, MATHEMATICAL Pub Date : 2025-03-01 Epub Date: 2024-10-15 DOI: 10.1177/01466216241291233
Jonas Bjermo

The design of an achievement test is crucial for many reasons. This article focuses on a population's ability growth between school grades. We define design as the allocating of test items concerning the difficulties. The objective is to present an optimal test design method for estimating the mean and percentile ability growth with good precision. We use the asymptotic expression of the variance in terms of the test information. With that criterion for optimization, we propose to use particle swarm optimization to find the optimal design. The results show that the allocation of the item difficulties depends on item discrimination and the magnitude of the ability growth. The optimization function depends on the examinees' abilities, hence, the value of the unknown mean ability growth. Therefore, we will also use an optimum in-average design and conclude that it is robust to uncertainty in the mean ability growth. A test is, in practice, assembled from items stored in an item pool with calibrated item parameters. Hence, we also perform a discrete optimization using simulated annealing and compare the results to the particle swarm optimization.

出于多种原因,成绩测验的设计至关重要。本文的重点是研究学生在不同年级之间的能力增长情况。我们将设计定义为测试项目难度的分配。我们的目标是提出一种最佳的测验设计方法,以精确地估计平均值和百分位数的能力增长。我们使用测试信息方差的渐近表达式。根据这一优化标准,我们建议使用粒子群优化来找到最优设计。结果表明,题目难度的分配取决于题目的区分度和能力增长的幅度。优化函数取决于考生的能力,因此也取决于未知的平均能力增长值。因此,我们也将使用平均值内最优设计,并得出结论:它对平均能力增长的不确定性具有稳健性。在实践中,测试是由存储在项目库中的项目和经过校准的项目参数组合而成的。因此,我们还将使用模拟退火进行离散优化,并将结果与粒子群优化进行比较。
{"title":"Optimal Test Design for Estimation of Mean Ability Growth.","authors":"Jonas Bjermo","doi":"10.1177/01466216241291233","DOIUrl":"10.1177/01466216241291233","url":null,"abstract":"<p><p>The design of an achievement test is crucial for many reasons. This article focuses on a population's ability growth between school grades. We define design as the allocating of test items concerning the difficulties. The objective is to present an optimal test design method for estimating the mean and percentile ability growth with good precision. We use the asymptotic expression of the variance in terms of the test information. With that criterion for optimization, we propose to use particle swarm optimization to find the optimal design. The results show that the allocation of the item difficulties depends on item discrimination and the magnitude of the ability growth. The optimization function depends on the examinees' abilities, hence, the value of the unknown mean ability growth. Therefore, we will also use an optimum in-average design and conclude that it is robust to uncertainty in the mean ability growth. A test is, in practice, assembled from items stored in an item pool with calibrated item parameters. Hence, we also perform a discrete optimization using simulated annealing and compare the results to the particle swarm optimization.</p>","PeriodicalId":48300,"journal":{"name":"Applied Psychological Measurement","volume":" ","pages":"29-49"},"PeriodicalIF":1.2,"publicationDate":"2025-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11560061/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142630381","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Two-Step Q-Matrix Estimation Method. 两步 Q 矩阵估算法
IF 1.2 4区 心理学 Q4 PSYCHOLOGY, MATHEMATICAL Pub Date : 2025-03-01 Epub Date: 2024-10-10 DOI: 10.1177/01466216241284418
Hans-Friedrich Köhn, Chia-Yi Chiu, Olasumbo Oluwalana, Hyunjoo Kim, Jiaxi Wang

Cognitive Diagnosis Models in educational measurement are restricted latent class models that describe ability in a knowledge domain as a composite of latent skills an examinee may have mastered or failed. Different combinations of skills define distinct latent proficiency classes to which examinees are assigned based on test performance. Items of cognitively diagnostic assessments are characterized by skill profiles specifying which skills are required for a correct item response. The item-skill profiles of a test form its Q-matrix. The validity of cognitive diagnosis depends crucially on the correct specification of the Q-matrix. Typically, Q-matrices are determined by curricular experts. However, expert judgment is fallible. Data-driven estimation methods have been developed with the promise of greater accuracy in identifying the Q-matrix of a test. Yet, many of the extant methods encounter computational feasibility issues either in the form of excessive amounts of CPU times or inadmissible estimates. In this article, a two-step algorithm for estimating the Q-matrix is proposed that can be used with any cognitive diagnosis model. Simulations showed that the new method outperformed extant estimation algorithms and was computationally more efficient. It was also applied to Tatsuoka's famous fraction-subtraction data. The paper concludes with a discussion of theoretical and practical implications of the findings.

教育测量中的认知诊断模型是一种有限制的潜类模型,它将某一知识领域的能力描述为受测者可能已掌握或未掌握的潜技能的组合。不同的技能组合定义了不同的潜在能力等级,受测者根据测试成绩被分配到不同的等级。认知诊断式测评的项目是以技能描述为特征的,具体说明正确的项目回答需要哪些技能。测验的项目-技能特征构成了测验的 Q 矩阵。认知诊断的有效性在很大程度上取决于 Q 矩阵的规格是否正确。Q 矩阵通常由课程专家确定。然而,专家的判断是不可靠的。数据驱动的估算方法应运而生,有望更准确地确定测试的 Q 矩阵。然而,许多现存方法都遇到了计算可行性问题,要么需要耗费过多的 CPU 时间,要么估算结果不可接受。本文提出了一种估算 Q 矩阵的两步算法,可用于任何认知诊断模型。模拟结果表明,新方法的性能优于现有的估计算法,而且计算效率更高。该方法还被应用于 Tatsuoka 著名的分数减法数据。论文最后讨论了研究结果的理论和实践意义。
{"title":"A Two-Step Q-Matrix Estimation Method.","authors":"Hans-Friedrich Köhn, Chia-Yi Chiu, Olasumbo Oluwalana, Hyunjoo Kim, Jiaxi Wang","doi":"10.1177/01466216241284418","DOIUrl":"10.1177/01466216241284418","url":null,"abstract":"<p><p>Cognitive Diagnosis Models in educational measurement are restricted latent class models that describe ability in a knowledge domain as a composite of latent skills an examinee may have mastered or failed. Different combinations of skills define distinct latent proficiency classes to which examinees are assigned based on test performance. Items of cognitively diagnostic assessments are characterized by skill profiles specifying which skills are required for a correct item response. The item-skill profiles of a test form its Q-matrix. The validity of cognitive diagnosis depends crucially on the correct specification of the Q-matrix. Typically, Q-matrices are determined by curricular experts. However, expert judgment is fallible. Data-driven estimation methods have been developed with the promise of greater accuracy in identifying the Q-matrix of a test. Yet, many of the extant methods encounter computational feasibility issues either in the form of excessive amounts of CPU times or inadmissible estimates. In this article, a two-step algorithm for estimating the Q-matrix is proposed that can be used with any cognitive diagnosis model. Simulations showed that the new method outperformed extant estimation algorithms and was computationally more efficient. It was also applied to Tatsuoka's famous fraction-subtraction data. The paper concludes with a discussion of theoretical and practical implications of the findings.</p>","PeriodicalId":48300,"journal":{"name":"Applied Psychological Measurement","volume":" ","pages":"3-28"},"PeriodicalIF":1.2,"publicationDate":"2025-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11560062/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142630379","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The Improved EMS Algorithm for Latent Variable Selection in M3PL Model. 用于 M3PL 模型中潜在变量选择的改进 EMS 算法。
IF 1.2 4区 心理学 Q4 PSYCHOLOGY, MATHEMATICAL Pub Date : 2025-03-01 Epub Date: 2024-10-21 DOI: 10.1177/01466216241291237
Laixu Shang, Ping-Feng Xu, Na Shan, Man-Lai Tang, Qian-Zhen Zheng

One of the main concerns in multidimensional item response theory (MIRT) is to detect the relationship between items and latent traits, which can be treated as a latent variable selection problem. An attractive method for latent variable selection in multidimensional 2-parameter logistic (M2PL) model is to minimize the observed Bayesian information criterion (BIC) by the expectation model selection (EMS) algorithm. The EMS algorithm extends the EM algorithm and allows the updates of the model (e.g., the loading structure in MIRT) in the iterations along with the parameters under the model. As an extension of the M2PL model, the multidimensional 3-parameter logistic (M3PL) model introduces an additional guessing parameter which makes the latent variable selection more challenging. In this paper, a well-designed EMS algorithm, named improved EMS (IEMS), is proposed to accurately and efficiently detect the underlying true loading structure in the M3PL model, which also works for the M2PL model. In simulation studies, we compare the IEMS algorithm with several state-of-art methods and the IEMS is of competitiveness in terms of model recovery, estimation precision, and computational efficiency. The IEMS algorithm is illustrated by its application to two real data sets.

多维项目反应理论(MIRT)的主要关注点之一是检测项目与潜在特质之间的关系,这可以看作是一个潜在变量选择问题。在多维双参数逻辑(M2PL)模型中,一种有吸引力的潜变量选择方法是通过期望模型选择(EMS)算法使观察到的贝叶斯信息准则(BIC)最小化。EMS 算法扩展了 EM 算法,允许在迭代中更新模型(如 MIRT 中的负载结构)和模型下的参数。作为 M2PL 模型的扩展,多维三参数逻辑(M3PL)模型引入了一个额外的猜测参数,这使得潜变量选择更具挑战性。本文提出了一种精心设计的 EMS 算法,名为改进 EMS(IEMS),用于准确有效地检测 M3PL 模型中潜在的真实负载结构,该算法同样适用于 M2PL 模型。在模拟研究中,我们将 IEMS 算法与几种最先进的方法进行了比较,IEMS 在模型恢复、估计精度和计算效率方面都具有竞争力。IEMS 算法在两个真实数据集上的应用说明了这一点。
{"title":"The Improved EMS Algorithm for Latent Variable Selection in M3PL Model.","authors":"Laixu Shang, Ping-Feng Xu, Na Shan, Man-Lai Tang, Qian-Zhen Zheng","doi":"10.1177/01466216241291237","DOIUrl":"10.1177/01466216241291237","url":null,"abstract":"<p><p>One of the main concerns in multidimensional item response theory (MIRT) is to detect the relationship between items and latent traits, which can be treated as a latent variable selection problem. An attractive method for latent variable selection in multidimensional 2-parameter logistic (M2PL) model is to minimize the observed Bayesian information criterion (BIC) by the expectation model selection (EMS) algorithm. The EMS algorithm extends the EM algorithm and allows the updates of the model (e.g., the loading structure in MIRT) in the iterations along with the parameters under the model. As an extension of the M2PL model, the multidimensional 3-parameter logistic (M3PL) model introduces an additional guessing parameter which makes the latent variable selection more challenging. In this paper, a well-designed EMS algorithm, named improved EMS (IEMS), is proposed to accurately and efficiently detect the underlying true loading structure in the M3PL model, which also works for the M2PL model. In simulation studies, we compare the IEMS algorithm with several state-of-art methods and the IEMS is of competitiveness in terms of model recovery, estimation precision, and computational efficiency. The IEMS algorithm is illustrated by its application to two real data sets.</p>","PeriodicalId":48300,"journal":{"name":"Applied Psychological Measurement","volume":" ","pages":"50-70"},"PeriodicalIF":1.2,"publicationDate":"2025-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11559968/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142630392","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Applied Psychological Measurement
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1