首页 > 最新文献

Applied Psychological Measurement最新文献

英文 中文
Termination Criteria for Grid Multiclassification Adaptive Testing With Multidimensional Polytomous Items. 具有多维多体项目的网格多分类自适应测试的终止准则。
IF 1 4区 心理学 Q4 PSYCHOLOGY, MATHEMATICAL Pub Date : 2022-10-01 Epub Date: 2022-06-16 DOI: 10.1177/01466216221108383
Zhuoran Wang, Chun Wang, David J Weiss

Adaptive classification testing (ACT) is a variation of computerized adaptive testing (CAT) that is developed to efficiently classify examinees into multiple groups based on predetermined cutoffs. In multidimensional multiclassification (i.e., more than two categories exist along each dimension), grid classification is proposed to classify each examinee into one of the grids encircled by cutoffs (lines/surfaces) along different dimensions so as to provide clearer information regarding an examinee's relative standing along each dimension and facilitate subsequent treatment and intervention. In this article, the sequential probability ratio test (SPRT) and confidence interval method were implemented in the grid multiclassification ACT. In addition, two new termination criteria, the grid classification generalized likelihood ratio (GGLR) and simplified grid classification generalized likelihood ratio were proposed for grid multiclassification ACT. Simulation studies, using a simulated item bank, and a real item bank with polytomous multidimensional items, show that grid multiclassification ACT is more efficient than classification based on measurement CAT that focuses on trait estimate precision. In the context of a high-quality bank, GGLR was found to most efficiently terminate the grid multiclassification ACT and classify examinees.

自适应分类测试(ACT)是计算机自适应测试(CAT)的一种变体,旨在根据预定的截止值将考生有效地分为多组。在多维多分类(即沿着每个维度存在多于两个类别)中,提出了网格分类,将每个受试者分类为沿不同维度由切口(线/表面)包围的网格之一,以便提供关于受试者沿每个维度的相对站立的更清晰的信息,并便于后续的治疗和干预。本文在网格多分类ACT中实现了序列概率比检验(SPRT)和置信区间方法。此外,针对网格多分类ACT,提出了两种新的终止准则,即网格分类广义似然比(GGLR)和简化网格分类广义可能性比。使用模拟项目库和具有多个多维项目的真实项目库进行的模拟研究表明,网格多分类ACT比基于注重特征估计精度的测量CAT的分类更有效。在高质量银行的背景下,GGLR被发现最有效地终止网格多分类ACT并对考生进行分类。
{"title":"Termination Criteria for Grid Multiclassification Adaptive Testing With Multidimensional Polytomous Items.","authors":"Zhuoran Wang, Chun Wang, David J Weiss","doi":"10.1177/01466216221108383","DOIUrl":"10.1177/01466216221108383","url":null,"abstract":"<p><p>Adaptive classification testing (ACT) is a variation of computerized adaptive testing (CAT) that is developed to efficiently classify examinees into multiple groups based on predetermined cutoffs. In multidimensional multiclassification (i.e., more than two categories exist along each dimension), grid classification is proposed to classify each examinee into one of the grids encircled by cutoffs (lines/surfaces) along different dimensions so as to provide clearer information regarding an examinee's relative standing along each dimension and facilitate subsequent treatment and intervention. In this article, the sequential probability ratio test (SPRT) and confidence interval method were implemented in the grid multiclassification ACT. In addition, two new termination criteria, the grid classification generalized likelihood ratio (GGLR) and simplified grid classification generalized likelihood ratio were proposed for grid multiclassification ACT. Simulation studies, using a simulated item bank, and a real item bank with polytomous multidimensional items, show that grid multiclassification ACT is more efficient than classification based on measurement CAT that focuses on trait estimate precision. In the context of a high-quality bank, GGLR was found to most efficiently terminate the grid multiclassification ACT and classify examinees.</p>","PeriodicalId":48300,"journal":{"name":"Applied Psychological Measurement","volume":null,"pages":null},"PeriodicalIF":1.0,"publicationDate":"2022-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9483219/pdf/10.1177_01466216221108383.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"33466449","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Investigating the Effect of Differential Rapid Guessing on Population Invariance in Equating. 研究微分快速猜测对等式中种群不变性的影响。
IF 1 4区 心理学 Q4 PSYCHOLOGY, MATHEMATICAL Pub Date : 2022-10-01 Epub Date: 2022-06-16 DOI: 10.1177/01466216221108991
Jiayi Deng, Joseph A Rios

Score equating is an essential tool in improving the fairness of test score interpretations when employing multiple test forms. To ensure that the equating functions used to connect scores from one form to another are valid, they must be invariant across different populations of examinees. Given that equating is used in many low-stakes testing programs, examinees' test-taking effort should be considered carefully when evaluating population invariance in equating, particularly as the occurrence of rapid guessing (RG) has been found to differ across subgroups. To this end, the current study investigated whether differential RG rates between subgroups can lead to incorrect inferences concerning population invariance in test equating. A simulation was built to generate data for two examinee subgroups (one more motivated than the other) administered two alternative forms of multiple-choice items. The rate of RG and ability characteristics of rapid guessers were manipulated. Results showed that as RG responses increased, false positive and false negative inferences of equating invariance were respectively observed at the lower and upper ends of the observed score scale. This result was supported by an empirical analysis of an international assessment. These findings suggest that RG should be investigated and documented prior to test equating, especially in low-stakes assessment contexts. A failure to do so may lead to incorrect inferences concerning fairness in equating.

当采用多种考试形式时,分数等值是提高考试分数解释公平性的重要工具。为了确保用于将分数从一种形式连接到另一种形式的等式函数是有效的,它们必须在不同的考生群体中保持不变。鉴于等式在许多低风险测试项目中都有使用,在评估等式中的群体不变性时,应仔细考虑考生的考试努力,特别是快速猜测(RG)的发生率在不同的亚组中有所不同。为此,目前的研究调查了亚组之间的差分RG率是否会导致关于检验等式中总体不变性的错误推断。建立了一个模拟来生成两个受试者亚组(一个比另一个更有动力)的数据,这两个亚组管理了两种可选形式的多项选择题。操纵快速猜测者的RG率和能力特征。结果表明,随着RG反应的增加,在观察到的评分表的下端和上端分别观察到等式不变性的假阳性和假阴性推断。这一结果得到了一项国际评估的实证分析的支持。这些发现表明,RG应在测试等值之前进行调查和记录,尤其是在低风险评估环境中。如果不这样做,可能会导致关于等式公平性的错误推断。
{"title":"Investigating the Effect of Differential Rapid Guessing on Population Invariance in Equating.","authors":"Jiayi Deng, Joseph A Rios","doi":"10.1177/01466216221108991","DOIUrl":"10.1177/01466216221108991","url":null,"abstract":"<p><p>Score equating is an essential tool in improving the fairness of test score interpretations when employing multiple test forms. To ensure that the equating functions used to connect scores from one form to another are valid, they must be invariant across different populations of examinees. Given that equating is used in many low-stakes testing programs, examinees' test-taking effort should be considered carefully when evaluating population invariance in equating, particularly as the occurrence of rapid guessing (RG) has been found to differ across subgroups. To this end, the current study investigated whether differential RG rates between subgroups can lead to incorrect inferences concerning population invariance in test equating. A simulation was built to generate data for two examinee subgroups (one more motivated than the other) administered two alternative forms of multiple-choice items. The rate of RG and ability characteristics of rapid guessers were manipulated. Results showed that as RG responses increased, false positive and false negative inferences of equating invariance were respectively observed at the lower and upper ends of the observed score scale. This result was supported by an empirical analysis of an international assessment. These findings suggest that RG should be investigated and documented prior to test equating, especially in low-stakes assessment contexts. A failure to do so may lead to incorrect inferences concerning fairness in equating.</p>","PeriodicalId":48300,"journal":{"name":"Applied Psychological Measurement","volume":null,"pages":null},"PeriodicalIF":1.0,"publicationDate":"2022-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9483216/pdf/10.1177_01466216221108991.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"33466450","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multistage Testing in Heterogeneous Populations: Some Design and Implementation Considerations. 异质群体中的多阶段测试:一些设计和实现方面的考虑。
IF 1.2 4区 心理学 Q2 Social Sciences Pub Date : 2022-09-01 DOI: 10.1177/01466216221108123
Leslie Rutkowski, Yuan-Ling Liaw, Dubravka Svetina, David Rutkowski

A central challenge in international large-scale assessments is adequately measuring dozens of highly heterogeneous populations, many of which are low performers. To that end, multistage adaptive testing offers one possibility for better assessing across the achievement continuum. This study examines the way that several multistage test design and implementation choices can impact measurement performance in this setting. To attend to gaps in the knowledge base, we extended previous research to include multiple, linked panels, more appropriate estimates of achievement, and multiple populations of varied proficiency. Including achievement distributions from varied populations and associated item parameters, we design and execute a simulation study that mimics an established international assessment. We compare several routing schemes and varied module lengths in terms of item and person parameter recovery. Our findings suggest that, particularly for low performing populations, multistage testing offers precision advantages. Further, findings indicate that equal module lengths-desirable for controlling position effects-and classical routing methods, which lower the technological burden of implementing such a design, produce good results. Finally, probabilistic misrouting offers advantages over merit routing for controlling bias in item and person parameters. Overall, multistage testing shows promise for extending the scope of international assessments. We discuss the importance of our findings for operational work in the international assessment domain.

国际大规模评估的一个核心挑战是充分衡量几十个高度异质的人群,其中许多人表现不佳。为此,多阶段自适应测试为更好地评估整个成就连续体提供了一种可能性。本研究考察了在这种情况下,几个多阶段测试设计和实现选择可以影响测量性能的方式。为了解决知识库中的差距,我们扩展了以前的研究,包括多个关联的小组,更适当的成就估计,以及不同熟练程度的多个人群。包括不同人群的成就分布和相关项目参数,我们设计并执行了一项模拟国际评估的模拟研究。我们比较了几种路由方案和不同的模块长度在项目和人参数恢复方面。我们的研究结果表明,特别是对于表现不佳的人群,多级测试提供了精确的优势。此外,研究结果表明,相同的模块长度-理想的控制位置效应-和经典的路由方法,这降低了实现这种设计的技术负担,产生了良好的结果。最后,概率错误路径在控制项目和人参数偏差方面比优点路径更有优势。总的来说,多阶段测试显示了扩大国际评估范围的希望。我们讨论了我们的研究结果对国际评估领域业务工作的重要性。
{"title":"Multistage Testing in Heterogeneous Populations: Some Design and Implementation Considerations.","authors":"Leslie Rutkowski,&nbsp;Yuan-Ling Liaw,&nbsp;Dubravka Svetina,&nbsp;David Rutkowski","doi":"10.1177/01466216221108123","DOIUrl":"https://doi.org/10.1177/01466216221108123","url":null,"abstract":"<p><p>A central challenge in international large-scale assessments is adequately measuring dozens of highly heterogeneous populations, many of which are low performers. To that end, multistage adaptive testing offers one possibility for better assessing across the achievement continuum. This study examines the way that several multistage test design and implementation choices can impact measurement performance in this setting. To attend to gaps in the knowledge base, we extended previous research to include multiple, linked panels, more appropriate estimates of achievement, and multiple populations of varied proficiency. Including achievement distributions from varied populations and associated item parameters, we design and execute a simulation study that mimics an established international assessment. We compare several routing schemes and varied module lengths in terms of item and person parameter recovery. Our findings suggest that, particularly for low performing populations, multistage testing offers precision advantages. Further, findings indicate that equal module lengths-desirable for controlling position effects-and classical routing methods, which lower the technological burden of implementing such a design, produce good results. Finally, probabilistic misrouting offers advantages over merit routing for controlling bias in item and person parameters. Overall, multistage testing shows promise for extending the scope of international assessments. We discuss the importance of our findings for operational work in the international assessment domain.</p>","PeriodicalId":48300,"journal":{"name":"Applied Psychological Measurement","volume":null,"pages":null},"PeriodicalIF":1.2,"publicationDate":"2022-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9382094/pdf/10.1177_01466216221108123.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10189453","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Characterizing Sampling Variability for Item Response Theory Scale Scores in a Fixed-Parameter Calibrated Projection Design. 固定参数校准投影设计中项目反应理论量表得分的抽样变异性特征。
IF 1.2 4区 心理学 Q2 Social Sciences Pub Date : 2022-09-01 DOI: 10.1177/01466216221108136
Shuangshuang Xu, Yang Liu

A common practice of linking uses estimated item parameters to calculate projected scores. This procedure fails to account for the carry-over sampling variability. Neglecting sampling variability could consequently lead to understated uncertainty for Item Response Theory (IRT) scale scores. To address the issue, we apply a Multiple Imputation (MI) approach to adjust the Posterior Standard Deviations of IRT scale scores. The MI procedure involves drawing multiple sets of plausible values from an approximate sampling distribution of the estimated item parameters. When two scales to be linked were previously calibrated, item parameters can be fixed at their original published scales, and the latent variable means and covariances of the two scales can then be estimated conditional on the fixed item parameters. The conditional estimation procedure is a special case of Restricted Recalibration (RR), in which the asymptotic sampling distribution of estimated parameters follows from the general theory of pseudo Maximum Likelihood (ML) estimation. We evaluate the combination of RR and MI by a simulation study to examine the impact of carry-over sampling variability under various simulation conditions. We also illustrate how to apply the proposed method to real data by revisiting Thissen et al. (2015).

链接的一个常见做法是使用估计的项目参数来计算预测分数。该程序未能考虑到结转抽样的可变性。忽略抽样变异性可能导致项目反应理论(IRT)量表得分的不确定性被低估。为了解决这个问题,我们采用了多重插值(MI)方法来调整IRT量表得分的后验标准差。MI过程包括从估计项目参数的近似抽样分布中绘制多组可信值。当两个待连接的量表先前已校准后,可以将项目参数固定在其原始公布的量表上,然后可以根据固定的项目参数估计两个量表的潜变量均值和协方差。条件估计过程是限制重校准(RR)的一种特殊情况,其估计参数的渐近抽样分布遵循伪极大似然估计的一般理论。我们通过模拟研究来评估RR和MI的组合,以检查在各种模拟条件下携带抽样变异性的影响。我们还通过重新访问Thissen等人(2015)来说明如何将所提出的方法应用于实际数据。
{"title":"Characterizing Sampling Variability for Item Response Theory Scale Scores in a Fixed-Parameter Calibrated Projection Design.","authors":"Shuangshuang Xu,&nbsp;Yang Liu","doi":"10.1177/01466216221108136","DOIUrl":"https://doi.org/10.1177/01466216221108136","url":null,"abstract":"<p><p>A common practice of linking uses estimated item parameters to calculate projected scores. This procedure fails to account for the carry-over sampling variability. Neglecting sampling variability could consequently lead to understated uncertainty for Item Response Theory (IRT) scale scores. To address the issue, we apply a Multiple Imputation (MI) approach to adjust the Posterior Standard Deviations of IRT scale scores. The MI procedure involves drawing multiple sets of plausible values from an approximate sampling distribution of the estimated item parameters. When two scales to be linked were previously calibrated, item parameters can be fixed at their original published scales, and the latent variable means and covariances of the two scales can then be estimated conditional on the fixed item parameters. The conditional estimation procedure is a special case of Restricted Recalibration (RR), in which the asymptotic sampling distribution of estimated parameters follows from the general theory of pseudo Maximum Likelihood (ML) estimation. We evaluate the combination of RR and MI by a simulation study to examine the impact of carry-over sampling variability under various simulation conditions. We also illustrate how to apply the proposed method to real data by revisiting Thissen et al. (2015).</p>","PeriodicalId":48300,"journal":{"name":"Applied Psychological Measurement","volume":null,"pages":null},"PeriodicalIF":1.2,"publicationDate":"2022-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9382091/pdf/10.1177_01466216221108136.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10133732","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Application of Sampling Variance of Item Response Theory Parameter Estimates in Detecting Outliers in Common Item Equating. 项目反应理论参数估计抽样方差在常见项目方程异常值检测中的应用。
IF 1.2 4区 心理学 Q2 Social Sciences Pub Date : 2022-09-01 DOI: 10.1177/01466216221108122
Chunyan Liu, Daniel Jurich

In common item equating, the existence of item outliers may impact the accuracy of equating results and bring significant ramifications to the validity of test score interpretations. Therefore, common item equating should involve a screening process to flag outlying items and exclude them from the common item set before equating is conducted. The current simulation study demonstrated that the sampling variance associated with the item response theory (IRT) item parameter estimates can help detect outliers in the common items under the 2-PL and 3-PL IRT models. The results showed the proposed sampling variance statistic (SV) outperformed the traditional displacement method with cutoff values of 0.3 and 0.5 along a variety of evaluation criteria. Based on the favorable results, item outlier detection statistics based on estimated sampling variability warrant further consideration in both research and practice.

在普通题项等值中,题项异常值的存在会影响等值结果的准确性,并对考试成绩解释的效度产生重大影响。因此,公共项目相等应该包括一个筛选过程,以标记外围项目,并在进行相等之前将它们从公共项目集中排除。模拟研究表明,在2-PL和3-PL IRT模型下,与项目反应理论(IRT)项目参数估计相关的抽样方差可以帮助检测常见项目的异常值。结果表明,在多种评价标准下,所提出的抽样方差统计量(SV)均优于传统的位移法,其截止值分别为0.3和0.5。在良好的结果基础上,基于估计抽样变异性的项目异常点检测统计在研究和实践中都值得进一步考虑。
{"title":"Application of Sampling Variance of Item Response Theory Parameter Estimates in Detecting Outliers in Common Item Equating.","authors":"Chunyan Liu,&nbsp;Daniel Jurich","doi":"10.1177/01466216221108122","DOIUrl":"https://doi.org/10.1177/01466216221108122","url":null,"abstract":"<p><p>In common item equating, the existence of item outliers may impact the accuracy of equating results and bring significant ramifications to the validity of test score interpretations. Therefore, common item equating should involve a screening process to flag outlying items and exclude them from the common item set before equating is conducted. The current simulation study demonstrated that the sampling variance associated with the item response theory (IRT) item parameter estimates can help detect outliers in the common items under the 2-PL and 3-PL IRT models. The results showed the proposed sampling variance statistic (<i>SV</i>) outperformed the traditional displacement method with cutoff values of 0.3 and 0.5 along a variety of evaluation criteria. Based on the favorable results, item outlier detection statistics based on estimated sampling variability warrant further consideration in both research and practice.</p>","PeriodicalId":48300,"journal":{"name":"Applied Psychological Measurement","volume":null,"pages":null},"PeriodicalIF":1.2,"publicationDate":"2022-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9382092/pdf/10.1177_01466216221108122.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10487809","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Two New Models for Item Preknowledge. 项目预知的两个新模型。
IF 1.2 4区 心理学 Q2 Social Sciences Pub Date : 2022-09-01 DOI: 10.1177/01466216221108130
Kylie Gorney, James A Wollack

To evaluate preknowledge detection methods, researchers often conduct simulation studies in which they use models to generate the data. In this article, we propose two new models to represent item preknowledge. Contrary to existing models, we allow the impact of preknowledge to vary across persons and items in order to better represent situations that are encountered in practice. We use three real data sets to evaluate the fit of the new models with respect to two types of preknowledge: items only, and items and the correct answer key. Results show that the two new models provide the best fit compared to several other existing preknowledge models. Furthermore, model parameter estimates were found to vary substantially depending on the type of preknowledge being considered, indicating that answer key disclosure has a profound impact on testing behavior.

为了评估预知检测方法,研究人员经常进行模拟研究,他们使用模型来生成数据。在本文中,我们提出了两个新的模型来表示项目预知。与现有模型相反,我们允许预知的影响因人而异,以便更好地代表实践中遇到的情况。我们使用三个真实数据集来评估新模型在两种类型的预知方面的拟合性:仅项目和项目和正确答案关键。结果表明,与已有的几种预知模型相比,这两个模型的拟合效果最好。此外,模型参数估计值根据所考虑的预知类型而有很大差异,这表明答案键披露对测试行为有深远的影响。
{"title":"Two New Models for Item Preknowledge.","authors":"Kylie Gorney,&nbsp;James A Wollack","doi":"10.1177/01466216221108130","DOIUrl":"https://doi.org/10.1177/01466216221108130","url":null,"abstract":"<p><p>To evaluate preknowledge detection methods, researchers often conduct simulation studies in which they use models to generate the data. In this article, we propose two new models to represent item preknowledge. Contrary to existing models, we allow the impact of preknowledge to vary across persons and items in order to better represent situations that are encountered in practice. We use three real data sets to evaluate the fit of the new models with respect to two types of preknowledge: items only, and items and the correct answer key. Results show that the two new models provide the best fit compared to several other existing preknowledge models. Furthermore, model parameter estimates were found to vary substantially depending on the type of preknowledge being considered, indicating that answer key disclosure has a profound impact on testing behavior.</p>","PeriodicalId":48300,"journal":{"name":"Applied Psychological Measurement","volume":null,"pages":null},"PeriodicalIF":1.2,"publicationDate":"2022-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9382093/pdf/10.1177_01466216221108130.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10487814","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Item-Fit Statistic Based on Posterior Probabilities of Membership in Ability Groups. 基于能力组隶属度后验概率的项目拟合统计。
IF 1.2 4区 心理学 Q2 Social Sciences Pub Date : 2022-09-01 DOI: 10.1177/01466216221108061
Bartosz Kondratek

A novel approach to item-fit analysis based on an asymptotic test is proposed. The new test statistic, χ w 2 , compares pseudo-observed and expected item mean scores over a set of ability bins. The item mean scores are computed as weighted means with weights based on test-takers' a posteriori density of ability within the bin. This article explores the properties of χ w 2 in case of dichotomously scored items for unidimensional IRT models. Monte Carlo experiments were conducted to analyze the performance of χ w 2 . Type I error of χ w 2   was acceptably close to the nominal level and it had greater power than Orlando and Thissen's S - x 2 . Under some conditions, power of χ w 2 also exceeded the one reported for the computationally more demanding Stone's χ 2 .

提出了一种基于渐近检验的项目拟合分析新方法。新的检验统计量χ w 2比较了一组能力箱上的伪观察和预期项目平均得分。项目平均得分以加权平均数计算,权重基于考生在bin中的后验能力密度。本文探讨了单维IRT模型中二分类得分项目的χ w 2的性质。通过蒙特卡罗实验对χ w2的性能进行了分析。χ w 2的I型误差可接受地接近名义水平,其功率大于Orlando和Thissen的S - x2。在某些条件下,χ w 2的幂也超过了计算要求更高的Stone的χ 2 *所报告的幂。
{"title":"Item-Fit Statistic Based on Posterior Probabilities of Membership in Ability Groups.","authors":"Bartosz Kondratek","doi":"10.1177/01466216221108061","DOIUrl":"https://doi.org/10.1177/01466216221108061","url":null,"abstract":"<p><p>A novel approach to item-fit analysis based on an asymptotic test is proposed. The new test statistic, <math> <mrow><msubsup><mi>χ</mi> <mi>w</mi> <mn>2</mn></msubsup> </mrow> </math> , compares pseudo-observed and expected item mean scores over a set of ability bins. The item mean scores are computed as weighted means with weights based on test-takers' <i>a posteriori</i> density of ability within the bin. This article explores the properties of <math> <mrow><msubsup><mi>χ</mi> <mi>w</mi> <mn>2</mn></msubsup> </mrow> </math> in case of dichotomously scored items for unidimensional IRT models. Monte Carlo experiments were conducted to analyze the performance of <math> <mrow><msubsup><mi>χ</mi> <mi>w</mi> <mn>2</mn></msubsup> </mrow> </math> . Type I error of <math> <mrow><msubsup><mi>χ</mi> <mi>w</mi> <mn>2</mn></msubsup> <mo> </mo></mrow> </math> was acceptably close to the nominal level and it had greater power than Orlando and Thissen's <math><mrow><mi>S</mi> <mo>-</mo> <msup><mi>x</mi> <mn>2</mn></msup> </mrow> </math> . Under some conditions, power of <math> <mrow><msubsup><mi>χ</mi> <mi>w</mi> <mn>2</mn></msubsup> </mrow> </math> also exceeded the one reported for the computationally more demanding Stone's <math> <mrow><msup><mi>χ</mi> <mrow><mn>2</mn> <mo>∗</mo></mrow> </msup> </mrow> </math> .</p>","PeriodicalId":48300,"journal":{"name":"Applied Psychological Measurement","volume":null,"pages":null},"PeriodicalIF":1.2,"publicationDate":"2022-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9382089/pdf/10.1177_01466216221108061.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10132911","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Item Response Theory True Score Equating for the Bifactor Model Under the Common-Item Nonequivalent Groups Design. 共同项目非等值组设计下双因子模型的项目反应理论真实得分等化。
IF 1 4区 心理学 Q4 PSYCHOLOGY, MATHEMATICAL Pub Date : 2022-09-01 Epub Date: 2022-06-17 DOI: 10.1177/01466216221108995
Kyung Yong Kim

Applying item response theory (IRT) true score equating to multidimensional IRT models is not straightforward due to the one-to-many relationship between a true score and latent variables. Under the common-item nonequivalent groups design, the purpose of the current study was to introduce two IRT true score equating procedures that adopted different dimension reduction strategies for the bifactor model. The first procedure, which was referred to as the integration procedure, linked the latent variable scales for the bifactor model and integrated out the specific factors from the item response function of the bifactor model. Then, IRT true score equating was applied to the marginalized bifactor model. The second procedure, which was referred to as the PIRT-based procedure, projected the specific dimensions onto the general dimension to obtain a locally dependent unidimensional IRT (UIRT) model and linked the scales of the UIRT model, followed by the application of IRT true score equating to the locally dependent UIRT model. Equating results obtained with the two equating procedures along with those obtained with the unidimensional three-parameter logistic (3PL) model were compared using both simulated and real data. In general, the integration and PIRT-based procedures provided equating results that were not practically different. Furthermore, the equating results produced by the two bifactor-based procedures became more accurate than the results returned by the 3PL model as tests became more multidimensional.

由于真分数与潜变量之间是一对多的关系,因此将项目反应理论(IRT)真分等化应用于多维 IRT 模型并不简单。在共项非等效组设计下,本研究的目的是引入两种 IRT 真分等效程序,对双因素模型采用不同的维度缩减策略。第一种程序被称为整合程序,它将双因素模型的潜变量量表连接起来,并从双因素模型的项目反应函数中整合出特定因素。然后,对边际化的双因素模型进行 IRT 真实得分等化。第二个程序被称为基于 PIRT 的程序,它将特定维度投射到一般维度上,从而得到一个局部依赖的单维 IRT(UIRT)模型,并将 UIRT 模型的量表连接起来,然后对局部依赖的 UIRT 模型应用 IRT 真分等化法。利用模拟数据和真实数据,比较了两种等分程序的等分结果和单维三参数逻辑(3PL)模型的等分结果。总的来说,积分法和基于 PIRT 的方法得出的等效结果没有实际差别。此外,随着测试变得更加多维化,这两种基于双因素的程序所得出的均衡结果比 3PL 模型所得出的结果更加准确。
{"title":"Item Response Theory True Score Equating for the Bifactor Model Under the Common-Item Nonequivalent Groups Design.","authors":"Kyung Yong Kim","doi":"10.1177/01466216221108995","DOIUrl":"10.1177/01466216221108995","url":null,"abstract":"<p><p>Applying item response theory (IRT) true score equating to multidimensional IRT models is not straightforward due to the one-to-many relationship between a true score and latent variables. Under the common-item nonequivalent groups design, the purpose of the current study was to introduce two IRT true score equating procedures that adopted different dimension reduction strategies for the bifactor model. The first procedure, which was referred to as the integration procedure, linked the latent variable scales for the bifactor model and integrated out the specific factors from the item response function of the bifactor model. Then, IRT true score equating was applied to the marginalized bifactor model. The second procedure, which was referred to as the PIRT-based procedure, projected the specific dimensions onto the general dimension to obtain a locally dependent unidimensional IRT (UIRT) model and linked the scales of the UIRT model, followed by the application of IRT true score equating to the locally dependent UIRT model. Equating results obtained with the two equating procedures along with those obtained with the unidimensional three-parameter logistic (3PL) model were compared using both simulated and real data. In general, the integration and PIRT-based procedures provided equating results that were not practically different. Furthermore, the equating results produced by the two bifactor-based procedures became more accurate than the results returned by the 3PL model as tests became more multidimensional.</p>","PeriodicalId":48300,"journal":{"name":"Applied Psychological Measurement","volume":null,"pages":null},"PeriodicalIF":1.0,"publicationDate":"2022-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9382090/pdf/10.1177_01466216221108995.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10189451","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Factor Retention Using Machine Learning With Ordinal Data. 使用有序数据的机器学习进行因子保留。
IF 1.2 4区 心理学 Q2 Social Sciences Pub Date : 2022-07-01 Epub Date: 2022-05-04 DOI: 10.1177/01466216221089345
David Goretzko, Markus Bühner

Determining the number of factors in exploratory factor analysis is probably the most crucial decision when conducting the analysis as it clearly influences the meaningfulness of the results (i.e., factorial validity). A new method called the Factor Forest that combines data simulation and machine learning has been developed recently. This method based on simulated data reached very high accuracy for multivariate normal data, but it has not yet been tested with ordinal data. Hence, in this simulation study, we evaluated the Factor Forest with ordinal data based on different numbers of categories (2-6 categories) and compared it to common factor retention criteria. It showed higher overall accuracy for all types of ordinal data than all common factor retention criteria that were used for comparison (Parallel Analysis, Comparison Data, the Empirical Kaiser Criterion and the Kaiser Guttman Rule). The results indicate that the Factor Forest is applicable to ordinal data with at least five categories (typical scale in questionnaire research) in the majority of conditions and to binary or ordinal data based on items with less categories when the sample size is large.

确定探索性因素分析中的因素数量可能是进行分析时最关键的决定,因为它明显影响结果的意义(即,析因效度)。最近开发了一种将数据模拟和机器学习相结合的新方法“因子森林”。这种基于模拟数据的方法对多变量正态数据具有很高的精度,但尚未对有序数据进行测试。因此,在本模拟研究中,我们基于不同数量的类别(2-6个类别)使用有序数据评估因子森林,并将其与常见的因子保留标准进行比较。它显示所有类型的有序数据的总体准确性高于所有用于比较的常见因素保留标准(平行分析,比较数据,经验凯撒标准和凯撒古特曼规则)。结果表明,因子森林在大多数情况下适用于至少5个类别的有序数据(问卷研究中的典型量表),在样本量较大的情况下适用于基于较少类别项目的二元或有序数据。
{"title":"Factor Retention Using Machine Learning With Ordinal Data.","authors":"David Goretzko,&nbsp;Markus Bühner","doi":"10.1177/01466216221089345","DOIUrl":"https://doi.org/10.1177/01466216221089345","url":null,"abstract":"<p><p>Determining the number of factors in exploratory factor analysis is probably the most crucial decision when conducting the analysis as it clearly influences the meaningfulness of the results (i.e., factorial validity). A new method called the Factor Forest that combines data simulation and machine learning has been developed recently. This method based on simulated data reached very high accuracy for multivariate normal data, but it has not yet been tested with ordinal data. Hence, in this simulation study, we evaluated the Factor Forest with ordinal data based on different numbers of categories (2-6 categories) and compared it to common factor retention criteria. It showed higher overall accuracy for all types of ordinal data than all common factor retention criteria that were used for comparison (Parallel Analysis, Comparison Data, the Empirical Kaiser Criterion and the Kaiser Guttman Rule). The results indicate that the Factor Forest is applicable to ordinal data with at least five categories (typical scale in questionnaire research) in the majority of conditions and to binary or ordinal data based on items with less categories when the sample size is large.</p>","PeriodicalId":48300,"journal":{"name":"Applied Psychological Measurement","volume":null,"pages":null},"PeriodicalIF":1.2,"publicationDate":"2022-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ftp.ncbi.nlm.nih.gov/pub/pmc/oa_pdf/ff/4b/10.1177_01466216221089345.PMC9265486.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"40489940","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
glca: An R Package for Multiple-Group Latent Class Analysis. glca:用于多组潜类分析的 R 软件包。
IF 1 4区 心理学 Q4 PSYCHOLOGY, MATHEMATICAL Pub Date : 2022-07-01 Epub Date: 2022-05-11 DOI: 10.1177/01466216221084197
Youngsun Kim, Saebom Jeon, Chi Chang, Hwan Chung

Group similarities and differences may manifest themselves in a variety of ways in multiple-group latent class analysis (LCA). Sometimes, measurement models are identical across groups in LCA. In other situations, the measurement models may differ, suggesting that the latent structure itself is different between groups. Tests of measurement invariance shed light on this distinction. We created an R package glca that implements procedures for exploring differences in latent class structure between populations, taking multilevel data structure into account. The glca package deals with the fixed-effect LCA and the nonparametric random-effect LCA; the former can be applied in the situation where populations are segmented by the observed group variable itself, whereas the latter can be used when there are too many levels in the group variable to make a meaningful group comparisons by identifying a group-level latent variable. The glca package consists of functions for statistical test procedures for exploring group differences in various LCA models considering multilevel data structure.

在多组潜类分析(LCA)中,组别的相似性和差异性可能会以各种方式表现出来。有时,在 LCA 中各组的测量模型是相同的。在其他情况下,测量模型可能不同,这表明不同组之间的潜在结构本身是不同的。测量不变性测试揭示了这一区别。我们创建了一个 R 软件包 glca,该软件包在考虑多层次数据结构的基础上,实现了探索人群间潜类结构差异的程序。glca 软件包可处理固定效应 LCA 和非参数随机效应 LCA;前者可用于由观察到的群体变量本身对人群进行划分的情况,而后者可用于群体变量层次过多,无法通过识别群体水平潜变量进行有意义的群体比较的情况。glca 软件包包含一些函数,用于在考虑多层次数据结构的各种生命周期分析模型中探索群体差异的统计检验程序。
{"title":"glca: An R Package for Multiple-Group Latent Class Analysis.","authors":"Youngsun Kim, Saebom Jeon, Chi Chang, Hwan Chung","doi":"10.1177/01466216221084197","DOIUrl":"10.1177/01466216221084197","url":null,"abstract":"<p><p>Group similarities and differences may manifest themselves in a variety of ways in multiple-group latent class analysis (LCA). Sometimes, measurement models are identical across groups in LCA. In other situations, the measurement models may differ, suggesting that the latent structure itself is different between groups. Tests of measurement invariance shed light on this distinction. We created an R package glca that implements procedures for exploring differences in latent class structure between populations, taking multilevel data structure into account. The glca package deals with the fixed-effect LCA and the nonparametric random-effect LCA; the former can be applied in the situation where populations are segmented by the observed group variable itself, whereas the latter can be used when there are too many levels in the group variable to make a meaningful group comparisons by identifying a group-level latent variable. The glca package consists of functions for statistical test procedures for exploring group differences in various LCA models considering multilevel data structure.</p>","PeriodicalId":48300,"journal":{"name":"Applied Psychological Measurement","volume":null,"pages":null},"PeriodicalIF":1.0,"publicationDate":"2022-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9265491/pdf/10.1177_01466216221084197.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10091269","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Applied Psychological Measurement
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1