首页 > 最新文献

Applied Psychological Measurement最新文献

英文 中文
Sequential Bayesian Ability Estimation Applied to Mixed-Format Item Tests. 序列贝叶斯能力估计在混合格式项目测试中的应用。
IF 1 4区 心理学 Q4 PSYCHOLOGY, MATHEMATICAL Pub Date : 2023-09-01 Epub Date: 2023-09-08 DOI: 10.1177/01466216231201986
Jiawei Xiong, Allan S Cohen, Xinhui Maggie Xiong

Large-scale tests often contain mixed-format items, such as when multiple-choice (MC) items and constructed-response (CR) items are both contained in the same test. Although previous research has analyzed both types of items simultaneously, this may not always provide the best estimate of ability. In this paper, a two-step sequential Bayesian (SB) analytic method under the concept of empirical Bayes is explored for mixed item response models. This method integrates ability estimates from different item formats. Unlike the empirical Bayes method, the SB method estimates examinees' posterior ability parameters with individual-level sample-dependent prior distributions estimated from the MC items. Simulations were used to evaluate the accuracy of recovery of ability and item parameters over four factors: the type of the ability distribution, sample size, test length (number of items for each item type), and person/item parameter estimation method. The SB method was compared with a traditional concurrent Bayesian (CB) calibration method, EAPsum, that uses scaled scores for summed scores to estimate parameters from the MC and CR items simultaneously in one estimation step. From the simulation results, the SB method showed more accurate and reliable ability estimation than the CB method, especially when the sample size was small (150 and 500). Both methods presented similar recovery results for MC item parameters, but the CB method yielded a bit better recovery of the CR item parameters. The empirical example suggested that posterior ability estimated by the proposed SB method had higher reliability than the CB method.

大规模测试通常包含混合格式的项目,例如当多项选择(MC)项目和构造反应(CR)项目都包含在同一测试中时。尽管先前的研究同时分析了这两种类型的项目,但这可能并不总能提供对能力的最佳估计。本文在经验贝叶斯的概念下,对混合项目反应模型的两步序列贝叶斯分析方法进行了探索。该方法集成了来自不同项目格式的能力评估。与经验贝叶斯方法不同,SB方法使用从MC项目估计的个体水平样本相关先验分布来估计考生的后验能力参数。模拟用于评估四个因素的能力和项目参数恢复的准确性:能力分布的类型、样本量、测试长度(每个项目类型的项目数量)和个人/项目参数估计方法。将SB方法与传统的并发贝叶斯(CB)校准方法EAPsum进行了比较,该方法使用缩放分数作为总分数,在一个估计步骤中同时估计MC和CR项目的参数。从模拟结果来看,SB方法比CB方法显示出更准确可靠的能力估计,尤其是当样本量较小(150和500)时。两种方法对MC项目参数的恢复结果相似,但CB方法对CR项目参数的修复效果要好一些。实例表明,所提出的SB方法估计的后验能力比CB方法具有更高的可靠性。
{"title":"Sequential Bayesian Ability Estimation Applied to Mixed-Format Item Tests.","authors":"Jiawei Xiong, Allan S Cohen, Xinhui Maggie Xiong","doi":"10.1177/01466216231201986","DOIUrl":"10.1177/01466216231201986","url":null,"abstract":"<p><p>Large-scale tests often contain mixed-format items, such as when multiple-choice (MC) items and constructed-response (CR) items are both contained in the same test. Although previous research has analyzed both types of items simultaneously, this may not always provide the best estimate of ability. In this paper, a two-step sequential Bayesian (SB) analytic method under the concept of empirical Bayes is explored for mixed item response models. This method integrates ability estimates from different item formats. Unlike the empirical Bayes method, the SB method estimates examinees' posterior ability parameters with individual-level sample-dependent prior distributions estimated from the MC items. Simulations were used to evaluate the accuracy of recovery of ability and item parameters over four factors: the type of the ability distribution, sample size, test length (number of items for each item type), and person/item parameter estimation method. The SB method was compared with a traditional concurrent Bayesian (CB) calibration method, EAPsum, that uses scaled scores for summed scores to estimate parameters from the MC and CR items simultaneously in one estimation step. From the simulation results, the SB method showed more accurate and reliable ability estimation than the CB method, especially when the sample size was small (150 and 500). Both methods presented similar recovery results for MC item parameters, but the CB method yielded a bit better recovery of the CR item parameters. The empirical example suggested that posterior ability estimated by the proposed SB method had higher reliability than the CB method.</p>","PeriodicalId":48300,"journal":{"name":"Applied Psychological Measurement","volume":"47 5-6","pages":"402-419"},"PeriodicalIF":1.0,"publicationDate":"2023-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10552734/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41180283","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Modeling Rating Order Effects Under Item Response Theory Models for Rater-Mediated Assessments. 评价中介评价项目反应理论模型下评价顺序效应的建模。
IF 1.2 4区 心理学 Q4 PSYCHOLOGY, MATHEMATICAL Pub Date : 2023-06-01 DOI: 10.1177/01466216231174566
Hung-Yu Huang

Rater effects are commonly observed in rater-mediated assessments. By using item response theory (IRT) modeling, raters can be treated as independent factors that function as instruments for measuring ratees. Most rater effects are static and can be addressed appropriately within an IRT framework, and a few models have been developed for dynamic rater effects. Operational rating projects often require human raters to continuously and repeatedly score ratees over a certain period, imposing a burden on the cognitive processing abilities and attention spans of raters that stems from judgment fatigue and thus affects the rating quality observed during the rating period. As a result, ratees' scores may be influenced by the order in which they are graded by raters in a rating sequence, and the rating order effect should be considered in new IRT models. In this study, two types of many-faceted (MF)-IRT models are developed to account for such dynamic rater effects, which assume that rater severity can drift systematically or stochastically. The results obtained from two simulation studies indicate that the parameters of the newly developed models can be estimated satisfactorily using Bayesian estimation and that disregarding the rating order effect produces biased model structure and ratee proficiency parameter estimations. A creativity assessment is outlined to demonstrate the application of the new models and to investigate the consequences of failing to detect the possible rating order effect in a real rater-mediated evaluation.

评价者效应通常在评价者介导的评估中观察到。通过项目反应理论(IRT)建模,可以将评价者视为独立的因素,作为衡量评价率的工具。大多数比率效应是静态的,可以在IRT框架内适当地处理,并且已经为动态比率效应开发了一些模型。操作性评级项目往往需要人类评分员在一段时间内连续重复地评分,这给评分员的认知加工能力和注意力带来了负担,这是由于判断疲劳造成的,从而影响了评分期间观察到的评分质量。因此,评分者的分数可能会受到评分者在评分序列中的评分顺序的影响,在新的IRT模型中应该考虑评分顺序效应。在本研究中,开发了两种类型的多面(MF)-IRT模型来解释这种动态评级效应,这些模型假设评级严重程度可以系统地或随机地漂移。两项仿真研究的结果表明,新建立的模型参数可以用贝叶斯估计得到满意的估计,忽略评级顺序效应会导致模型结构和率熟练度参数估计有偏差。本文概述了一个创造力评估,以展示新模型的应用,并调查在真实的评分中介评估中未能检测到可能的评分顺序效应的后果。
{"title":"Modeling Rating Order Effects Under Item Response Theory Models for Rater-Mediated Assessments.","authors":"Hung-Yu Huang","doi":"10.1177/01466216231174566","DOIUrl":"https://doi.org/10.1177/01466216231174566","url":null,"abstract":"<p><p>Rater effects are commonly observed in rater-mediated assessments. By using item response theory (IRT) modeling, raters can be treated as independent factors that function as instruments for measuring ratees. Most rater effects are static and can be addressed appropriately within an IRT framework, and a few models have been developed for dynamic rater effects. Operational rating projects often require human raters to continuously and repeatedly score ratees over a certain period, imposing a burden on the cognitive processing abilities and attention spans of raters that stems from judgment fatigue and thus affects the rating quality observed during the rating period. As a result, ratees' scores may be influenced by the order in which they are graded by raters in a rating sequence, and the rating order effect should be considered in new IRT models. In this study, two types of many-faceted (MF)-IRT models are developed to account for such dynamic rater effects, which assume that rater severity can drift systematically or stochastically. The results obtained from two simulation studies indicate that the parameters of the newly developed models can be estimated satisfactorily using Bayesian estimation and that disregarding the rating order effect produces biased model structure and ratee proficiency parameter estimations. A creativity assessment is outlined to demonstrate the application of the new models and to investigate the consequences of failing to detect the possible rating order effect in a real rater-mediated evaluation.</p>","PeriodicalId":48300,"journal":{"name":"Applied Psychological Measurement","volume":"47 4","pages":"312-327"},"PeriodicalIF":1.2,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ftp.ncbi.nlm.nih.gov/pub/pmc/oa_pdf/7c/68/10.1177_01466216231174566.PMC10240569.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10300637","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
A Mixed Sequential IRT Model for Mixed-Format Items. 混合格式项目的混合序列 IRT 模型。
IF 1.2 4区 心理学 Q4 PSYCHOLOGY, MATHEMATICAL Pub Date : 2023-06-01 Epub Date: 2023-03-17 DOI: 10.1177/01466216231165302
Junhuan Wei, Yan Cai, Dongbo Tu

To provide more insight into an individual's response process and cognitive process, this study proposed three mixed sequential item response models (MS-IRMs) for mixed-format items consisting of a mixture of a multiple-choice item and an open-ended item that emphasize a sequential response process and are scored sequentially. Relative to existing polytomous models such as the graded response model (GRM), generalized partial credit model (GPCM), or traditional sequential Rasch model (SRM), the proposed models employ an appropriate processing function for each task to improve conventional polytomous models. Simulation studies were carried out to investigate the performance of the proposed models, and the results indicated that all proposed models outperformed the SRM, GRM, and GPCM in terms of parameter recovery and model fit. An application illustration of the MS-IRMs in comparison with traditional models was demonstrated by using real data from TIMSS 2007.

为了更深入地了解个体的反应过程和认知过程,本研究提出了三种混合序列项目反应模型(MS-IRM),适用于由选择题和开放题混合组成的混合格式项目,这些项目强调序列反应过程并按序列计分。相对于现有的多项式模型,如分级反应模型(GRM)、广义部分学分模型(GPCM)或传统的序列拉希模型(SRM),所提出的模型为每个任务采用了适当的处理函数,以改进传统的多项式模型。研究人员进行了仿真研究以考察所提模型的性能,结果表明,所有所提模型在参数恢复和模型拟合方面均优于 SRM、GRM 和 GPCM。通过使用 TIMSS 2007 的真实数据,展示了 MS-IRM 与传统模型的应用比较。
{"title":"A Mixed Sequential IRT Model for Mixed-Format Items.","authors":"Junhuan Wei, Yan Cai, Dongbo Tu","doi":"10.1177/01466216231165302","DOIUrl":"10.1177/01466216231165302","url":null,"abstract":"<p><p>To provide more insight into an individual's response process and cognitive process, this study proposed three mixed sequential item response models (MS-IRMs) for mixed-format items consisting of a mixture of a multiple-choice item and an open-ended item that emphasize a sequential response process and are scored sequentially. Relative to existing polytomous models such as the graded response model (GRM), generalized partial credit model (GPCM), or traditional sequential Rasch model (SRM), the proposed models employ an appropriate processing function for each task to improve conventional polytomous models. Simulation studies were carried out to investigate the performance of the proposed models, and the results indicated that all proposed models outperformed the SRM, GRM, and GPCM in terms of parameter recovery and model fit. An application illustration of the MS-IRMs in comparison with traditional models was demonstrated by using real data from TIMSS 2007.</p>","PeriodicalId":48300,"journal":{"name":"Applied Psychological Measurement","volume":"47 4","pages":"259-274"},"PeriodicalIF":1.2,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10240568/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10297969","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Online Parameter Estimation for Student Evaluation of Teaching. 学生教学评价的在线参数估计。
IF 1.2 4区 心理学 Q4 PSYCHOLOGY, MATHEMATICAL Pub Date : 2023-06-01 Epub Date: 2023-03-19 DOI: 10.1177/01466216231165314
Chia-Wen Chen, Chen-Wei Liu

Student evaluation of teaching (SET) assesses students' experiences in a class to evaluate teachers' performance in class. SET essentially comprises three facets: teaching proficiency, student rating harshness, and item properties. The computerized adaptive testing form of SET with an established item pool has been used in educational environments. However, conventional scoring methods ignore the harshness of students toward teachers and, therefore, are unable to provide a valid assessment. In addition, simultaneously estimating teachers' teaching proficiency and students' harshness remains an unaddressed issue in the context of online SET. In the current study, we develop and compare three novel methods-marginal, iterative once, and hybrid approaches-to improve the precision of parameter estimations. A simulation study is conducted to demonstrate that the hybrid method is a promising technique that can substantially outperform traditional methods.

学生教学评价(SET)通过评估学生在课堂上的体验来评价教师在课堂上的表现。SET 主要包括三个方面:教学能力、学生评分的苛刻程度和项目属性。SET 的计算机自适应测试形式已在教育环境中使用,并建立了项目库。然而,传统的评分方法忽略了学生对教师的苛刻程度,因此无法提供有效的评估。此外,在在线 SET 中,同时估计教师的教学水平和学生的苛刻程度仍是一个尚未解决的问题。在本研究中,我们开发并比较了三种新方法--边际法、迭代一次法和混合法,以提高参数估计的精度。我们进行了一项模拟研究,证明混合方法是一种很有前途的技术,可以大大优于传统方法。
{"title":"Online Parameter Estimation for Student Evaluation of Teaching.","authors":"Chia-Wen Chen, Chen-Wei Liu","doi":"10.1177/01466216231165314","DOIUrl":"10.1177/01466216231165314","url":null,"abstract":"<p><p>Student evaluation of teaching (SET) assesses students' experiences in a class to evaluate teachers' performance in class. SET essentially comprises three facets: teaching proficiency, student rating harshness, and item properties. The computerized adaptive testing form of SET with an established item pool has been used in educational environments. However, conventional scoring methods ignore the harshness of students toward teachers and, therefore, are unable to provide a valid assessment. In addition, simultaneously estimating teachers' teaching proficiency and students' harshness remains an unaddressed issue in the context of online SET. In the current study, we develop and compare three novel methods-marginal, iterative once, and hybrid approaches-to improve the precision of parameter estimations. A simulation study is conducted to demonstrate that the hybrid method is a promising technique that can substantially outperform traditional methods.</p>","PeriodicalId":48300,"journal":{"name":"Applied Psychological Measurement","volume":"47 4","pages":"291-311"},"PeriodicalIF":1.2,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10240567/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10300642","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Using a Generalized Logistic Regression Method to Detect Differential Item Functioning With Multiple Groups in Cognitive Diagnostic Tests. 使用广义逻辑回归法检测认知诊断测试中多个组别的差异项目功能。
IF 1.2 4区 心理学 Q4 PSYCHOLOGY, MATHEMATICAL Pub Date : 2023-06-01 Epub Date: 2023-05-13 DOI: 10.1177/01466216231174559
Xiaojian Sun, Shimeng Wang, Lei Guo, Tao Xin, Naiqing Song

Items with the presence of differential item functioning (DIF) will compromise the validity and fairness of a test. Studies have investigated the DIF effect in the context of cognitive diagnostic assessment (CDA), and some DIF detection methods have been proposed. Most of these methods are mainly designed to perform the presence of DIF between two groups; however, empirical situations may contain more than two groups. To date, only a handful of studies have detected the DIF effect with multiple groups in the CDA context. This study uses the generalized logistic regression (GLR) method to detect DIF items by using the estimated attribute profile as matching criteria. A simulation study is conducted to examine the performance of the two GLR methods, GLR-based Wald test (GLR-Wald) and GLR-based likelihood ratio test (GLR-LRT), in detecting the DIF items, the results based on the ordinary Wald test are also reported. Results show that (1) both GLR-Wald and GLR-LRT have more reasonable performance in controlling Type I error rates than the ordinary Wald test in most conditions; (2) the GLR method also produces higher empirical rejection rates than the ordinary Wald test in most conditions; and (3) using the estimated attribute profile as the matching criteria can produce similar Type I error rates and empirical rejection rates for GLR-Wald and GLR-LRT. A real data example is also analyzed to illustrate the application of these DIF detection methods in multiple groups.

存在差异项目功能(DIF)的项目会影响测验的有效性和公平性。已有研究对认知诊断评估(CDA)中的 DIF 效应进行了调查,并提出了一些 DIF 检测方法。这些方法大多主要用于检测两组之间是否存在 DIF,但实际情况可能包含两组以上。迄今为止,只有少数研究在 CDA 情景下检测了多组的 DIF 效应。本研究使用广义逻辑回归(GLR)方法,将估计的属性特征作为匹配标准来检测 DIF 项目。通过模拟研究,考察了两种 GLR 方法(基于 GLR 的 Wald 检验(GLR-Wald)和基于 GLR 的似然比检验(GLR-LRT))在检测 DIF 项目时的性能,同时还报告了基于普通 Wald 检验的结果。结果表明:(1) 在大多数情况下,GLR-Wald 和 GLR-LRT 在控制 I 类错误率方面都比普通 Wald 检验有更合理的表现;(2) 在大多数情况下,GLR 方法也比普通 Wald 检验产生更高的经验拒绝率;(3) 使用估计的属性轮廓作为匹配标准可以使 GLR-Wald 和 GLR-LRT 产生相似的 I 类错误率和经验拒绝率。我们还分析了一个真实数据示例,以说明这些 DIF 检测方法在多组中的应用。
{"title":"Using a Generalized Logistic Regression Method to Detect Differential Item Functioning With Multiple Groups in Cognitive Diagnostic Tests.","authors":"Xiaojian Sun, Shimeng Wang, Lei Guo, Tao Xin, Naiqing Song","doi":"10.1177/01466216231174559","DOIUrl":"10.1177/01466216231174559","url":null,"abstract":"<p><p>Items with the presence of differential item functioning (DIF) will compromise the validity and fairness of a test. Studies have investigated the DIF effect in the context of cognitive diagnostic assessment (CDA), and some DIF detection methods have been proposed. Most of these methods are mainly designed to perform the presence of DIF between two groups; however, empirical situations may contain more than two groups. To date, only a handful of studies have detected the DIF effect with multiple groups in the CDA context. This study uses the generalized logistic regression (GLR) method to detect DIF items by using the estimated attribute profile as matching criteria. A simulation study is conducted to examine the performance of the two GLR methods, GLR-based Wald test (GLR-Wald) and GLR-based likelihood ratio test (GLR-LRT), in detecting the DIF items, the results based on the ordinary Wald test are also reported. Results show that (1) both GLR-Wald and GLR-LRT have more reasonable performance in controlling Type I error rates than the ordinary Wald test in most conditions; (2) the GLR method also produces higher empirical rejection rates than the ordinary Wald test in most conditions; and (3) using the estimated attribute profile as the matching criteria can produce similar Type I error rates and empirical rejection rates for GLR-Wald and GLR-LRT. A real data example is also analyzed to illustrate the application of these DIF detection methods in multiple groups.</p>","PeriodicalId":48300,"journal":{"name":"Applied Psychological Measurement","volume":"47 4","pages":"328-346"},"PeriodicalIF":1.2,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10240570/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10300639","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The Impact of Item Model Parameter Variations on Person Parameter Estimation in Computerized Adaptive Testing With Automatically Generated Items. 在使用自动生成项目的计算机化自适应测试中,项目模型参数变化对人员参数估计的影响。
IF 1.2 4区 心理学 Q4 PSYCHOLOGY, MATHEMATICAL Pub Date : 2023-06-01 Epub Date: 2023-03-17 DOI: 10.1177/01466216231165313
Chen Tian, Jaehwa Choi

Sibling items developed through automatic item generation share similar but not identical psychometric properties. However, considering sibling item variations may bring huge computation difficulties and little improvement on scoring. Assuming identical characteristics among siblings, this study explores the impact of item model parameter variations (i.e., within-family variation between siblings) on person parameter estimation in linear tests and Computerized Adaptive Testing (CAT). Specifically, we explore (1) what if small/medium/large within-family variance is ignored, (2) if the effect of larger within-model variance can be compensated by greater test length, (3) if the item model pool properties affect the impact of within-family variance on scoring, and (4) if the issues in (1) and (2) are different in linear vs. adaptive testing. Related sibling model is used for data generation and identical sibling model is assumed for scoring. Manipulated factors include test length, the size of within-model variation, and item model pool characteristics. Results show that as within-family variance increases, the standard error of scores remains at similar levels. For correlations between true and estimated score and RMSE, the effect of the larger within-model variance was compensated by test length. For bias, scores are biased towards the center, and bias was not compensated by test length. Despite the within-family variation is random in current simulations, to yield less biased ability estimates, the item model pool should provide balanced opportunities such that "fake-easy" and "fake-difficult" item instances cancel their effects. The results of CAT are similar to that of linear tests, except for higher efficiency.

通过自动生成项目开发的同源项目具有相似但不完全相同的心理测量特性。然而,考虑兄弟姐妹间的项目差异可能会带来巨大的计算困难,而且对评分的改善甚微。本研究假定兄弟姐妹间的特征完全相同,探讨了项目模型参数变化(即兄弟姐妹间的家内变化)对线性测验和计算机化自适应测验(CAT)中的人参数估计的影响。具体来说,我们将探讨:(1) 如果忽略小/中/大的家内变异,(2) 更大的模型内变异的影响是否可以通过更长的测试长度来补偿,(3) 项目模型库的属性是否会影响家内变异对得分的影响,(4) (1) 和 (2) 中的问题在线性测试和适应性测试中是否有所不同。相关兄弟姐妹模型用于数据生成,相同兄弟姐妹模型用于评分。操纵因素包括测试长度、模型内变异大小和项目模型库特征。结果表明,随着家庭内变异的增加,分数的标准误差保持在相似的水平。对于真实分数和估计分数之间的相关性以及均方根误差,较大的模型内变异的影响被测试长度所补偿。至于偏差,分数偏向中心,偏差没有被测试长度补偿。尽管在目前的模拟中,族内变异是随机的,但为了减少能力估计值的偏差,项目模型库应提供均衡的机会,使 "假容易 "和 "假困难 "的项目实例抵消它们的影响。CAT 的结果与线性测试相似,只是效率更高。
{"title":"The Impact of Item Model Parameter Variations on Person Parameter Estimation in Computerized Adaptive Testing With Automatically Generated Items.","authors":"Chen Tian, Jaehwa Choi","doi":"10.1177/01466216231165313","DOIUrl":"10.1177/01466216231165313","url":null,"abstract":"<p><p>Sibling items developed through automatic item generation share similar but not identical psychometric properties. However, considering sibling item variations may bring huge computation difficulties and little improvement on scoring. Assuming identical characteristics among siblings, this study explores the impact of item model parameter variations (i.e., within-family variation between siblings) on person parameter estimation in linear tests and Computerized Adaptive Testing (CAT). Specifically, we explore (1) what if small/medium/large within-family variance is ignored, (2) if the effect of larger within-model variance can be compensated by greater test length, (3) if the item model pool properties affect the impact of within-family variance on scoring, and (4) if the issues in (1) and (2) are different in linear vs. adaptive testing. Related sibling model is used for data generation and identical sibling model is assumed for scoring. Manipulated factors include test length, the size of within-model variation, and item model pool characteristics. Results show that as within-family variance increases, the standard error of scores remains at similar levels. For correlations between true and estimated score and RMSE, the effect of the larger within-model variance was compensated by test length. For bias, scores are biased towards the center, and bias was not compensated by test length. Despite the within-family variation is random in current simulations, to yield less biased ability estimates, the item model pool should provide balanced opportunities such that \"fake-easy\" and \"fake-difficult\" item instances cancel their effects. The results of CAT are similar to that of linear tests, except for higher efficiency.</p>","PeriodicalId":48300,"journal":{"name":"Applied Psychological Measurement","volume":"47 4","pages":"275-290"},"PeriodicalIF":1.2,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10240571/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10300640","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A New Approach to Desirable Responding: Multidimensional Item Response Model of Overclaiming Data. 理想回应的新方法:超额索赔数据的多维项目反应模型。
IF 1.2 4区 心理学 Q4 PSYCHOLOGY, MATHEMATICAL Pub Date : 2023-05-01 Epub Date: 2023-01-19 DOI: 10.1177/01466216231151704
Kuan-Yu Jin, Delroy L Paulhus, Ching-Lin Shih

A variety of approaches have been presented for assessing desirable responding in self-report measures. Among them, the overclaiming technique asks respondents to rate their familiarity with a large set of real and nonexistent items (foils). The application of signal detection formulas to the endorsement rates of real items and foils yields indices of (a) knowledge accuracy and (b) knowledge bias. This overclaiming technique reflects both cognitive ability and personality. Here, we develop an alternative measurement model based on multidimensional item response theory (MIRT). We report three studies demonstrating this new model's capacity to analyze overclaiming data. First, a simulation study illustrates that MIRT and signal detection theory yield comparable indices of accuracy and bias-although MIRT provides important additional information. Two empirical examples-one based on mathematical terms and one based on Chinese idioms-are then elaborated. Together, they demonstrate the utility of this new approach for group comparisons and item selection. The implications of this research are illustrated and discussed.

在自我报告测量中,有多种评估理想反应的方法。其中,过度声称技术要求被调查者对大量真实和不存在的项目(衬托物)的熟悉程度进行评分。将信号检测公式应用于真实项目和陪衬项目的认可率,可得出(a)知识准确性指数和(b)知识偏差指数。这种过度认可技术同时反映了认知能力和个性。在此,我们基于多维项目反应理论(MIRT)开发了另一种测量模型。我们报告了三项研究,展示了这一新模型分析超额认领数据的能力。首先,一项模拟研究表明,多维项目反应理论和信号检测理论可以得出相似的准确性和偏差指数--尽管多维项目反应理论提供了重要的附加信息。然后,详细阐述了两个经验实例--一个基于数学术语,一个基于中国成语。这两个例子共同证明了这种新方法在分组比较和项目选择方面的实用性。本研究的意义将得到说明和讨论。
{"title":"A New Approach to Desirable Responding: Multidimensional Item Response Model of Overclaiming Data.","authors":"Kuan-Yu Jin, Delroy L Paulhus, Ching-Lin Shih","doi":"10.1177/01466216231151704","DOIUrl":"10.1177/01466216231151704","url":null,"abstract":"<p><p>A variety of approaches have been presented for assessing desirable responding in self-report measures. Among them, the overclaiming technique asks respondents to rate their familiarity with a large set of real and nonexistent items (foils). The application of signal detection formulas to the endorsement rates of real items and foils yields indices of (a) <i>knowledge accuracy</i> and (b) <i>knowledge bias</i>. This overclaiming technique reflects both cognitive ability and personality. Here, we develop an alternative measurement model based on multidimensional item response theory (MIRT). We report three studies demonstrating this new model's capacity to analyze overclaiming data. First, a simulation study illustrates that MIRT and signal detection theory yield comparable indices of accuracy and bias-although MIRT provides important additional information. Two empirical examples-one based on mathematical terms and one based on Chinese idioms-are then elaborated. Together, they demonstrate the utility of this new approach for group comparisons and item selection. The implications of this research are illustrated and discussed.</p>","PeriodicalId":48300,"journal":{"name":"Applied Psychological Measurement","volume":"47 3","pages":"221-236"},"PeriodicalIF":1.2,"publicationDate":"2023-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10126390/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9363746","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Testlet Diagnostic Classification Model with Attribute Hierarchies. 带属性层次的小测试诊断分类模型
IF 1 4区 心理学 Q4 PSYCHOLOGY, MATHEMATICAL Pub Date : 2023-05-01 Epub Date: 2023-03-21 DOI: 10.1177/01466216231165315
Wenchao Ma, Chun Wang, Jiaying Xiao

In this article, a testlet hierarchical diagnostic classification model (TH-DCM) was introduced to take both attribute hierarchies and item bundles into account. The expectation-maximization algorithm with an analytic dimension reduction technique was used for parameter estimation. A simulation study was conducted to assess the parameter recovery of the proposed model under varied conditions, and to compare TH-DCM with testlet higher-order CDM (THO-DCM; Hansen, M. (2013). Hierarchical item response models for cognitive diagnosis (Unpublished doctoral dissertation). UCLA; Zhan, P., Li, X., Wang, W.-C., Bian, Y., & Wang, L. (2015). The multidimensional testlet-effect cognitive diagnostic models. Acta Psychologica Sinica, 47(5), 689. https://doi.org/10.3724/SP.J.1041.2015.00689). Results showed that (1) ignoring large testlet effects worsened parameter recovery, (2) DCMs assuming equal testlet effects within each testlet performed as well as the testlet model assuming unequal testlet effects under most conditions, (3) misspecifications in joint attribute distribution had an differential impact on parameter recovery, and (4) THO-DCM seems to be a robust alternative to TH-DCM under some hierarchical structures. A set of real data was also analyzed for illustration.

本文引入了一种子测试分层诊断分类模型(Testlet hierarchical diagnostic classification model,TH-DCM),将属性分层和项目捆绑考虑在内。参数估计采用了期望最大化算法和解析降维技术。研究人员进行了一项模拟研究,以评估所提出模型在不同条件下的参数恢复情况,并将 TH-DCM 与测试子高阶 CDM(THO-DCM;Hansen, M. (2013)。用于认知诊断的分层项目反应模型(未发表的博士论文)。UCLA; Zhan, P., Li, X., Wang, W.-C., Bian, Y., & Wang, L. (2015).多维试题效应认知诊断模型。Acta Psychologica Sinica, 47(5), 689. https://doi.org/10.3724/SP.J.1041.2015.00689)。结果表明:(1) 忽略大的小测验效应会恶化参数恢复;(2) 在大多数条件下,假设每个小测验内的小测验效应相等的多维小测验效应认知诊断模型与假设小测验效应不相等的小测验效应认知诊断模型表现一样好;(3) 联合属性分布的错误规范对参数恢复有不同程度的影响;(4) 在某些层次结构下,THO-DCM似乎是TH-DCM的稳健替代品。为了说明问题,还分析了一组真实数据。
{"title":"A Testlet Diagnostic Classification Model with Attribute Hierarchies.","authors":"Wenchao Ma, Chun Wang, Jiaying Xiao","doi":"10.1177/01466216231165315","DOIUrl":"10.1177/01466216231165315","url":null,"abstract":"<p><p>In this article, a testlet hierarchical diagnostic classification model (TH-DCM) was introduced to take both attribute hierarchies and item bundles into account. The expectation-maximization algorithm with an analytic dimension reduction technique was used for parameter estimation. A simulation study was conducted to assess the parameter recovery of the proposed model under varied conditions, and to compare TH-DCM with testlet higher-order CDM (THO-DCM; Hansen, M. (2013). Hierarchical item response models for cognitive diagnosis (Unpublished doctoral dissertation). UCLA; Zhan, P., Li, X., Wang, W.-C., Bian, Y., & Wang, L. (2015). The multidimensional testlet-effect cognitive diagnostic models. Acta Psychologica Sinica, 47(5), 689. https://doi.org/10.3724/SP.J.1041.2015.00689). Results showed that (1) ignoring large testlet effects worsened parameter recovery, (2) DCMs assuming equal testlet effects within each testlet performed as well as the testlet model assuming unequal testlet effects under most conditions, (3) misspecifications in joint attribute distribution had an differential impact on parameter recovery, and (4) THO-DCM seems to be a robust alternative to TH-DCM under some hierarchical structures. A set of real data was also analyzed for illustration.</p>","PeriodicalId":48300,"journal":{"name":"Applied Psychological Measurement","volume":"47 3","pages":"183-199"},"PeriodicalIF":1.0,"publicationDate":"2023-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10126385/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9357116","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
On the Folly of Introducing A (Time-Based UMV), While Designing for B (Time-Based CMV). 在为 B(基于时间的 CMV)设计的同时引入 A(基于时间的 UMV)的愚蠢之举。
IF 1.2 4区 心理学 Q4 PSYCHOLOGY, MATHEMATICAL Pub Date : 2023-05-01 Epub Date: 2023-03-15 DOI: 10.1177/01466216231165304
Alice Brawley Newlin
{"title":"On the Folly of Introducing A (Time-Based UMV), While Designing for B (Time-Based CMV).","authors":"Alice Brawley Newlin","doi":"10.1177/01466216231165304","DOIUrl":"10.1177/01466216231165304","url":null,"abstract":"","PeriodicalId":48300,"journal":{"name":"Applied Psychological Measurement","volume":"47 3","pages":"253-256"},"PeriodicalIF":1.2,"publicationDate":"2023-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10126389/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9363899","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Enhancing Computerized Adaptive Testing with Batteries of Unidimensional Tests. 用单维测验组增强计算机化自适应测验。
IF 1.2 4区 心理学 Q4 PSYCHOLOGY, MATHEMATICAL Pub Date : 2023-05-01 Epub Date: 2023-03-24 DOI: 10.1177/01466216231165301
Pasquale Anselmi, Egidio Robusto, Francesca Cristante

The article presents a new computerized adaptive testing (CAT) procedure for use with batteries of unidimensional tests. At each step of testing, the estimate of a certain ability is updated on the basis of the response to the latest administered item and the current estimates of all other abilities measured by the battery. The information deriving from these abilities is incorporated into an empirical prior that is updated each time that new estimates of the abilities are computed. In two simulation studies, the performance of the proposed procedure is compared with that of a standard procedure for CAT with batteries of unidimensional tests. The proposed procedure yields more accurate ability estimates in fixed-length CATs, and a reduction of test length in variable-length CATs. These gains in accuracy and efficiency increase with the correlation between the abilities measured by the batteries.

文章介绍了一种新的计算机化自适应测试(CAT)程序,可用于单维度测试。在测试的每一个步骤中,对某项能力的估计值都会根据对最新施测项目的反应以及该测验组所测得的所有其他能力的当前估计值进行更新。从这些能力中获得的信息被纳入经验先验中,每次计算新的能力估计值时都会更新经验先验。在两项模拟研究中,我们将建议程序的性能与 CAT 的标准程序进行了比较。在长度固定的 CAT 中,建议的程序可以获得更准确的能力估计值,而在长度可变的 CAT 中,则可以减少测试长度。准确性和效率的提高会随着电池所测能力之间相关性的增加而增加。
{"title":"Enhancing Computerized Adaptive Testing with Batteries of Unidimensional Tests.","authors":"Pasquale Anselmi, Egidio Robusto, Francesca Cristante","doi":"10.1177/01466216231165301","DOIUrl":"10.1177/01466216231165301","url":null,"abstract":"<p><p>The article presents a new computerized adaptive testing (CAT) procedure for use with batteries of unidimensional tests. At each step of testing, the estimate of a certain ability is updated on the basis of the response to the latest administered item and the current estimates of all other abilities measured by the battery. The information deriving from these abilities is incorporated into an empirical prior that is updated each time that new estimates of the abilities are computed. In two simulation studies, the performance of the proposed procedure is compared with that of a standard procedure for CAT with batteries of unidimensional tests. The proposed procedure yields more accurate ability estimates in fixed-length CATs, and a reduction of test length in variable-length CATs. These gains in accuracy and efficiency increase with the correlation between the abilities measured by the batteries.</p>","PeriodicalId":48300,"journal":{"name":"Applied Psychological Measurement","volume":"47 3","pages":"167-182"},"PeriodicalIF":1.2,"publicationDate":"2023-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10126386/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9357115","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Applied Psychological Measurement
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1