Applied Psychological Measurement最新文献_第6页

Modeling Rapid Guessing Behaviors in Computer-Based Testlet Items. 基于计算机的小测试项目中的快速猜测行为建模。

IF 1.2 4区心理学 Q4 PSYCHOLOGY, MATHEMATICAL

Applied Psychological Measurement

Pub Date : 2023-01-01 Epub Date: 2022-09-09 DOI: 10.1177/01466216221125177

Kuan-Yu Jin, Chia-Ling Hsu, Ming Ming Chiu, Po-Hsi Chen

In traditional test models, test items are independent, and test-takers slowly and thoughtfully respond to each test item. However, some test items have a common stimulus (dependent test items in a testlet), and sometimes test-takers lack motivation, knowledge, or time (speededness), so they perform rapid guessing (RG). Ignoring the dependence in responses to testlet items can negatively bias standard errors of measurement, and ignoring RG by fitting a simpler item response theory (IRT) model can bias the results. Because computer-based testing captures response times on testlet responses, we propose a mixture testlet IRT model with item responses and response time to model RG behaviors in computer-based testlet items. Two simulation studies with Markov chain Monte Carlo estimation using the JAGS program showed (a) good recovery of the item and person parameters in this new model and (b) the harmful consequences of ignoring RG (biased parameter estimates: overestimated item difficulties, underestimated time intensities, underestimated respondent latent speed parameters, and overestimated precision of respondent latent estimates). The application of IRT models with and without RG to data from a computer-based language test showed parameter differences resembling those in the simulations.

在传统的测试模式中，测试项目是独立的，应试者对每个测试项目都会慢慢地、深思熟虑地做出反应。然而，有些测试项目有共同的刺激（子测试项目中的从属测试项目），有时应试者缺乏动机、知识或时间（速度），因此他们会进行快速猜测（RG）。如果忽略了小测验项目反应的依赖性，就会对测量的标准误差产生负面偏差，而通过拟合较简单的项目反应理论（IRT）模型来忽略 RG，也会使结果产生偏差。由于基于计算机的测试捕捉了小测验反应的反应时间，因此我们提出了一个包含项目反应和反应时间的混合小测验 IRT 模型，以模拟基于计算机的小测验项目中的 RG 行为。使用 JAGS 程序进行马尔可夫链蒙特卡罗估计的两项模拟研究表明：(a) 在这一新模型中，项目和人的参数恢复良好；(b) 忽略 RG 的有害后果（有偏差的参数估计：高估项目难度、低估时间强度、低估应答者潜在速度参数以及高估应答者潜在估计的精确度）。将有无 RG 的 IRT 模型应用于基于计算机的语言测试数据，结果显示参数差异与模拟结果相似。

{"title":"Modeling Rapid Guessing Behaviors in Computer-Based Testlet Items.","authors":"Kuan-Yu Jin, Chia-Ling Hsu, Ming Ming Chiu, Po-Hsi Chen","doi":"10.1177/01466216221125177","DOIUrl":"10.1177/01466216221125177","url":null,"abstract":"In traditional test models, test items are independent, and test-takers slowly and thoughtfully respond to each test item. However, some test items have a common stimulus (dependent test items in a testlet), and sometimes test-takers lack motivation, knowledge, or time (speededness), so they perform rapid guessing (RG). Ignoring the dependence in responses to testlet items can negatively bias standard errors of measurement, and ignoring RG by fitting a simpler item response theory (IRT) model can bias the results. Because computer-based testing captures response times on testlet responses, we propose a mixture testlet IRT model with item responses and response time to model RG behaviors in computer-based testlet items. Two simulation studies with Markov chain Monte Carlo estimation using the JAGS program showed (a) good recovery of the item and person parameters in this new model and (b) the harmful consequences of ignoring RG (biased parameter estimates: overestimated item difficulties, underestimated time intensities, underestimated respondent latent speed parameters, and overestimated precision of respondent latent estimates). The application of IRT models with and without RG to data from a computer-based language test showed parameter differences resembling those in the simulations.","PeriodicalId":48300,"journal":{"name":"Applied Psychological Measurement","volume":"47 1","pages":"19-33"},"PeriodicalIF":1.2,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9679923/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"40494726","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Efficient Metropolis-Hastings Robbins-Monro Algorithm for High-Dimensional Diagnostic Classification Models. 高维诊断分类模型的高效Metropolis-Hastings-Robbins-Monro算法。

IF 1.2 4区心理学 Q4 PSYCHOLOGY, MATHEMATICAL

Applied Psychological Measurement

Pub Date : 2022-11-01 Epub Date: 2022-09-08 DOI: 10.1177/01466216221123981

Chen-Wei Liu

The expectation-maximization (EM) algorithm is a commonly used technique for the parameter estimation of the diagnostic classification models (DCMs) with a prespecified Q-matrix; however, it requires O(2 ^K ) calculations in its expectation-step, which significantly slows down the computation when the number of attributes, K, is large. This study proposes an efficient Metropolis-Hastings Robbins-Monro (eMHRM) algorithm, needing only O(K + 1) calculations in the Monte Carlo expectation step. Furthermore, the item parameters and structural parameters are approximated via the Robbins-Monro algorithm, which does not require time-consuming nonlinear optimization procedures. A series of simulation studies were conducted to compare the eMHRM with the EM and a Metropolis-Hastings (MH) algorithm regarding the parameter recovery and execution time. The outcomes presented in this article reveal that the eMHRM is much more computationally efficient than the EM and MH, and it tends to produce better estimates than the EM when K is large, suggesting that the eMHRM is a promising parameter estimation method for high-dimensional DCMs.

期望最大化（EM）算法是用于具有预先指定的Q矩阵的诊断分类模型（DCM）的参数估计的常用技术；然而，它在其期望步骤中需要O（2K）计算，这在属性数量K较大时显著减慢了计算。本研究提出了一种有效的Metropolis Hastings-Robbins-Monro（eMHRM）算法，在蒙特卡罗期望步骤中只需要O（K+1）次计算。此外，项目参数和结构参数通过Robbins-Monro算法进行近似，该算法不需要耗时的非线性优化过程。进行了一系列模拟研究，将eMHRM与EM和Metropolis Hastings（MH）算法在参数恢复和执行时间方面进行了比较。本文中的结果表明，eMHRM在计算上比EM和MH高效得多，并且当K较大时，它往往比EM产生更好的估计，这表明eMHRM是一种很有前途的高维DCM参数估计方法。

{"title":"Efficient Metropolis-Hastings Robbins-Monro Algorithm for High-Dimensional Diagnostic Classification Models.","authors":"Chen-Wei Liu","doi":"10.1177/01466216221123981","DOIUrl":"10.1177/01466216221123981","url":null,"abstract":"The expectation-maximization (EM) algorithm is a commonly used technique for the parameter estimation of the diagnostic classification models (DCMs) with a prespecified Q-matrix; however, it requires O(2 K ) calculations in its expectation-step, which significantly slows down the computation when the number of attributes, K, is large. This study proposes an efficient Metropolis-Hastings Robbins-Monro (eMHRM) algorithm, needing only O(K + 1) calculations in the Monte Carlo expectation step. Furthermore, the item parameters and structural parameters are approximated via the Robbins-Monro algorithm, which does not require time-consuming nonlinear optimization procedures. A series of simulation studies were conducted to compare the eMHRM with the EM and a Metropolis-Hastings (MH) algorithm regarding the parameter recovery and execution time. The outcomes presented in this article reveal that the eMHRM is much more computationally efficient than the EM and MH, and it tends to produce better estimates than the EM when K is large, suggesting that the eMHRM is a promising parameter estimation method for high-dimensional DCMs.","PeriodicalId":48300,"journal":{"name":"Applied Psychological Measurement","volume":"46 8","pages":"662-674"},"PeriodicalIF":1.2,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9574082/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"40656644","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Attenuation-Corrected Estimators of Reliability. 可靠性的衰减校正估计。

IF 1.2 4区心理学 Q4 PSYCHOLOGY, MATHEMATICAL

Applied Psychological Measurement

Pub Date : 2022-11-01 Epub Date: 2022-09-15 DOI: 10.1177/01466216221108131

Jari Metsämuuronen

The estimates of reliability are usually attenuated and deflated because the item-score correlation ( $ρ_{g X}$ , Rit) embedded in the most widely used estimators is affected by several sources of mechanical error in the estimation. Empirical examples show that, in some types of datasets, the estimates by traditional alpha may be deflated by 0.40-0.60 units of reliability and those by maximal reliability by 0.40 units of reliability. This article proposes a new kind of estimator of correlation: attenuation-corrected correlation (R _AC ): the proportion of observed correlation with the maximal possible correlation reachable by the given item and score. By replacing $ρ_{g X}$ with R _AC in known formulas of estimators of reliability, we get attenuation-corrected alpha, theta, omega, and maximal reliability which all belong to a family of so-called deflation-corrected estimators of reliability.

可靠性的估计通常是衰减和收缩的，因为在最广泛使用的估计中嵌入的项目得分相关性(ρ g X, Rit)受到估计中几个机械误差来源的影响。实证表明，在某些类型的数据集中，传统的alpha估计可能会减少0.40-0.60个单位的信度，而最大信度估计可能会减少0.40个单位的信度。本文提出了一种新的相关估计量:衰减校正相关(attenuation-corrected correlation, R AC)，即观测到的相关与给定项目和得分所能达到的最大可能相关的比例。通过在已知的可靠性估计公式中用rac代替ρ g X，我们得到衰减校正的α， θ，和最大可靠性，它们都属于所谓的紧缩校正的可靠性估计。

{"title":"Attenuation-Corrected Estimators of Reliability.","authors":"Jari Metsämuuronen","doi":"10.1177/01466216221108131","DOIUrl":"https://doi.org/10.1177/01466216221108131","url":null,"abstract":"The estimates of reliability are usually attenuated and deflated because the item-score correlation ( <math> <mrow><msub><mi>ρ</mi> <mrow><mi>g</mi> <mi>X</mi></mrow> </msub> </mrow> </math> , Rit) embedded in the most widely used estimators is affected by several sources of mechanical error in the estimation. Empirical examples show that, in some types of datasets, the estimates by traditional alpha may be deflated by 0.40-0.60 units of reliability and those by maximal reliability by 0.40 units of reliability. This article proposes a new kind of estimator of correlation: attenuation-corrected correlation (R AC ): the proportion of observed correlation with the maximal possible correlation reachable by the given item and score. By replacing <math> <mrow><msub><mi>ρ</mi> <mrow><mi>g</mi> <mi>X</mi></mrow> </msub> </mrow> </math> with R AC in known formulas of estimators of reliability, we get attenuation-corrected alpha, theta, omega, and maximal reliability which all belong to a family of so-called deflation-corrected estimators of reliability.","PeriodicalId":48300,"journal":{"name":"Applied Psychological Measurement","volume":"46 8","pages":"720-737"},"PeriodicalIF":1.2,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ftp.ncbi.nlm.nih.gov/pub/pmc/oa_pdf/66/7b/10.1177_01466216221108131.PMC9574086.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"40573822","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

Item Selection With Collaborative Filtering in On-The-Fly Multistage Adaptive Testing. 基于协同过滤的动态多阶段自适应测试题项选择。

IF 1.2 4区心理学 Q4 PSYCHOLOGY, MATHEMATICAL

Applied Psychological Measurement

Pub Date : 2022-11-01 Epub Date: 2022-08-28 DOI: 10.1177/01466216221124089

Jiaying Xiao, Okan Bulut

An important design feature in the implementation of both computerized adaptive testing and multistage adaptive testing is the use of an appropriate method for item selection. The item selection method is expected to select the most optimal items depending on the examinees' ability level while considering other design features (e.g., item exposure and item bank utilization). This study introduced collaborative filtering (CF) as a new method for item selection in the on-the-fly assembled multistage adaptive testing framework. The user-based CF (UBCF) and item-based CF (IBCF) methods were compared to the maximum Fisher information method based on the accuracy of ability estimation, item exposure rates, and item bank utilization under different test conditions (e.g., item bank size, test length, and the sparseness of training data). The simulation results indicated that the UBCF method outperformed the traditional item selection methods regarding measurement accuracy. Also, the IBCF method showed the most superior performance in terms of item bank utilization. Limitations of the current study and the directions for future research are discussed.

在计算机化自适应测试和多阶段自适应测试的实施中，一个重要的设计特征是使用适当的项目选择方法。项目选择方法是根据考生的能力水平选择最优的项目，同时考虑其他设计特征(例如项目暴露和题库利用率)。本研究将协同过滤作为一种新的方法引入到动态装配多阶段自适应测试框架中。在不同测试条件(如题库大小、测试长度和训练数据稀疏度)下，将基于用户的题库(UBCF)和基于项目的题库(IBCF)方法与最大Fisher信息法进行能力估计的准确性、项目暴露率和题库利用率的比较。仿真结果表明，UBCF方法在测量精度上优于传统的项目选择方法。同时，IBCF法在物题库利用率方面表现出最优的性能。讨论了当前研究的局限性和未来的研究方向。

{"title":"Item Selection With Collaborative Filtering in On-The-Fly Multistage Adaptive Testing.","authors":"Jiaying Xiao, Okan Bulut","doi":"10.1177/01466216221124089","DOIUrl":"https://doi.org/10.1177/01466216221124089","url":null,"abstract":"An important design feature in the implementation of both computerized adaptive testing and multistage adaptive testing is the use of an appropriate method for item selection. The item selection method is expected to select the most optimal items depending on the examinees' ability level while considering other design features (e.g., item exposure and item bank utilization). This study introduced collaborative filtering (CF) as a new method for item selection in the on-the-fly assembled multistage adaptive testing framework. The user-based CF (UBCF) and item-based CF (IBCF) methods were compared to the maximum Fisher information method based on the accuracy of ability estimation, item exposure rates, and item bank utilization under different test conditions (e.g., item bank size, test length, and the sparseness of training data). The simulation results indicated that the UBCF method outperformed the traditional item selection methods regarding measurement accuracy. Also, the IBCF method showed the most superior performance in terms of item bank utilization. Limitations of the current study and the directions for future research are discussed.","PeriodicalId":48300,"journal":{"name":"Applied Psychological Measurement","volume":"46 8","pages":"690-704"},"PeriodicalIF":1.2,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ftp.ncbi.nlm.nih.gov/pub/pmc/oa_pdf/09/ba/10.1177_01466216221124089.PMC9574085.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"40656645","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Flexible Item Response Models for Count Data: The Count Thresholds Model. 计数数据的灵活项目响应模型：计数阈值模型。

IF 1.2 4区心理学 Q4 PSYCHOLOGY, MATHEMATICAL

Applied Psychological Measurement

Pub Date : 2022-11-01 Epub Date: 2022-08-07 DOI: 10.1177/01466216221108124

Gerhard Tutz

A new item response theory model for count data is introduced. In contrast to models in common use, it does not assume a fixed distribution for the responses as, for example, the Poisson count model and extensions do. The distribution of responses is determined by difficulty functions which reflect the characteristics of items in a flexible way. Sparse parameterizations are obtained by choosing fixed parametric difficulty functions, more general versions use an approximation by basis functions. The model can be seen as constructed from binary response models as the Rasch model or the normal-ogive model to which it reduces if responses are dichotomized. It is demonstrated that the model competes well with advanced count data models. Simulations demonstrate that parameters and response distributions are recovered well. An application shows the flexibility of the model to account for strongly varying distributions of responses.

介绍了一种新的计数数据项目反应理论模型。与常用的模型相比，它并没有像泊松计数模型和扩展那样假设响应的固定分布。响应的分布由难度函数决定，该函数以灵活的方式反映项目的特征。稀疏参数化是通过选择固定的参数难度函数来获得的，更一般的版本使用基函数的近似。该模型可以被视为是由二元响应模型构建的，如Rasch模型或正态ogive模型，如果响应是二分的，则该模型可以简化为二元响应。结果表明，该模型与先进的计数数据模型具有很好的竞争性。仿真结果表明，参数和响应分布恢复良好。一个应用程序显示了该模型的灵活性，可以考虑强烈变化的响应分布。

引用次数: 1

An Empirical Identification Issue of the Bifactor Item Response Theory Model. 双因子项目反应理论模型的实证识别问题。

IF 1.2 4区心理学 Q4 PSYCHOLOGY, MATHEMATICAL

Applied Psychological Measurement

Pub Date : 2022-11-01 Epub Date: 2022-07-10 DOI: 10.1177/01466216221108133

Wenya Chen, Ken A Fujimoto

Using the bifactor item response theory model to analyze data arising from educational and psychological studies has gained popularity over the years. Unfortunately, using this model in practice comes with challenges. One such challenge is an empirical identification issue that is seldom discussed in the literature, and its impact on the estimates of the bifactor model's parameters has not been demonstrated. This issue occurs when an item's discriminations on the general and specific dimensions are approximately equal (i.e., the within-item discriminations are similar in strength), leading to difficulties in obtaining unique estimates for those discriminations. We conducted three simulation studies to demonstrate that within-item discriminations being similar in strength creates problems in estimation stability. The results suggest that a large sample could alleviate but not resolve the problems, at least when considering sample sizes up to 4,000. When the discriminations within items were made clearly different, the estimates of these discriminations were more consistent across the data replicates than that observed when the discriminations within the items were similar. The results also show that the similarity of an item's discriminatory magnitudes on different dimensions has direct implications on the sample size needed in order to consistently obtain accurate parameter estimates. Although our goal was to provide evidence of the empirical identification issue, the study further reveals that the extent of similarity of within-item discriminations, the magnitude of discriminations, and how well the items are targeted to the respondents also play factors in the estimation of the bifactor model's parameters.

使用双因素项目反应理论模型来分析教育和心理研究中产生的数据近年来越来越受欢迎。不幸的是，在实践中使用这种模式会带来挑战。一个这样的挑战是文献中很少讨论的经验识别问题，它对双因子模型参数估计的影响也没有得到证明。当一个项目在一般维度和特定维度上的歧视大致相等时（即，项目内的歧视强度相似），就会出现这种问题，导致难以获得这些歧视的唯一估计。我们进行了三项模拟研究，以证明项目内的歧视在强度上相似会在估计稳定性方面产生问题。结果表明，大样本可以缓解但不能解决这些问题，至少在考虑4000个样本的情况下是这样。当项目内的辨别力明显不同时，这些辨别力的估计值在数据复制中比项目内辨别力相似时观察到的更一致。结果还表明，一个项目在不同维度上的判别幅度的相似性对一致获得准确参数估计所需的样本量有直接影响。尽管我们的目标是提供经验识别问题的证据，但该研究进一步表明，项目内歧视的相似程度、歧视的程度以及项目对受访者的针对性也在双因子模型参数的估计中发挥了作用。

{"title":"An Empirical Identification Issue of the Bifactor Item Response Theory Model.","authors":"Wenya Chen, Ken A Fujimoto","doi":"10.1177/01466216221108133","DOIUrl":"10.1177/01466216221108133","url":null,"abstract":"Using the bifactor item response theory model to analyze data arising from educational and psychological studies has gained popularity over the years. Unfortunately, using this model in practice comes with challenges. One such challenge is an empirical identification issue that is seldom discussed in the literature, and its impact on the estimates of the bifactor model's parameters has not been demonstrated. This issue occurs when an item's discriminations on the general and specific dimensions are approximately equal (i.e., the within-item discriminations are similar in strength), leading to difficulties in obtaining unique estimates for those discriminations. We conducted three simulation studies to demonstrate that within-item discriminations being similar in strength creates problems in estimation stability. The results suggest that a large sample could alleviate but not resolve the problems, at least when considering sample sizes up to 4,000. When the discriminations within items were made clearly different, the estimates of these discriminations were more consistent across the data replicates than that observed when the discriminations within the items were similar. The results also show that the similarity of an item's discriminatory magnitudes on different dimensions has direct implications on the sample size needed in order to consistently obtain accurate parameter estimates. Although our goal was to provide evidence of the empirical identification issue, the study further reveals that the extent of similarity of within-item discriminations, the magnitude of discriminations, and how well the items are targeted to the respondents also play factors in the estimation of the bifactor model's parameters.","PeriodicalId":48300,"journal":{"name":"Applied Psychological Measurement","volume":"46 8","pages":"675-689"},"PeriodicalIF":1.2,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9574084/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"40656647","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Modified Item-Fit Indices for Dichotomous IRT Models with Missing Data. 具有缺失数据的二分IRT模型的修正项目拟合指数。

IF 1.2 4区心理学 Q4 PSYCHOLOGY, MATHEMATICAL

Applied Psychological Measurement

Pub Date : 2022-11-01 Epub Date: 2022-09-19 DOI: 10.1177/01466216221125176

Xue Zhang, Chun Wang

Item-level fit analysis not only serves as a complementary check to global fit analysis, it is also essential in scale development because the fit results will guide item revision and/or deletion (Liu & Maydeu-Olivares, 2014). During data collection, missing response data may likely happen due to various reasons. Chi-square-based item fit indices (e.g., Yen's Q ₁ , McKinley and Mill's G ² , Orlando and Thissen's S-X ² and S-G ² ) are the most widely used statistics to assess item-level fit. However, the role of total scores with complete data used in S-X ² and S-G ² is different from that with incomplete data. As a result, S-X ² and S-G ² cannot handle incomplete data directly. To this end, we propose several modified versions of S-X ² and S-G ² to evaluate item-level fit when response data are incomplete, named as M _impute -X ² and M _impute -G ² , of which the subscript "impute" denotes different imputation methods. Instead of using observed total scores for grouping, the new indices rely on imputed total scores by either a single imputation method or three multiple imputation methods (i.e., two-way with normally distributed errors, corrected item-mean substitution with normally distributed errors and response function imputation). The new indices are equivalent to S-X ² and S-G ² when response data are complete. Their performances are evaluated and compared via simulation studies; the manipulated factors include test length, sources of misfit, misfit proportion, and missing proportion. The results from simulation studies are consistent with those of Orlando and Thissen (2000, 2003), and different indices are recommended under different conditions.

项目级拟合分析不仅是对全局拟合分析的补充检查，而且在规模开发中也至关重要，因为拟合结果将指导项目的修订和/或删除（Liu&Maydeu-Olivares，2014）。在数据收集过程中，由于各种原因，可能会出现响应数据丢失的情况。基于卡方的项目拟合指数（例如，Yen的Q1、McKinley和Mill的G2、Orlando和Thissen的s-X2和s-G2）是评估项目水平拟合的最广泛使用的统计数据。然而，在S-X2和S-G2中使用的数据完整的总分与数据不完整的总分的作用不同。因此，S-X2和S-G2不能直接处理不完整的数据。为此，我们提出了S-X2和S-G2的几个修改版本，以评估响应数据不完整时的项目水平拟合，分别命名为M估算-X2和M估算-G2，其中下标“估算”表示不同的估算方法。新指数不是使用观察到的总分进行分组，而是依赖于通过单一插补方法或三种多重插补方法（即具有正态分布误差的双向插补、具有正态分配误差的校正项目平均值替代和响应函数插补）的插补总分。当响应数据完整时，新指标等效于S-X2和S-G2。通过模拟研究对它们的性能进行评估和比较；操纵因素包括测试长度、失配源、失配比例和失配比例。模拟研究的结果与Orlando和Thissen（20002003）的结果一致，并在不同的条件下推荐了不同的指标。

{"title":"Modified Item-Fit Indices for Dichotomous IRT Models with Missing Data.","authors":"Xue Zhang, Chun Wang","doi":"10.1177/01466216221125176","DOIUrl":"10.1177/01466216221125176","url":null,"abstract":"Item-level fit analysis not only serves as a complementary check to global fit analysis, it is also essential in scale development because the fit results will guide item revision and/or deletion (Liu & Maydeu-Olivares, 2014). During data collection, missing response data may likely happen due to various reasons. Chi-square-based item fit indices (e.g., Yen's Q 1 , McKinley and Mill's G 2 , Orlando and Thissen's S-X 2 and S-G 2 ) are the most widely used statistics to assess item-level fit. However, the role of total scores with complete data used in S-X 2 and S-G 2 is different from that with incomplete data. As a result, S-X 2 and S-G 2 cannot handle incomplete data directly. To this end, we propose several modified versions of S-X 2 and S-G 2 to evaluate item-level fit when response data are incomplete, named as M impute -X 2 and M impute -G 2 , of which the subscript \"impute\" denotes different imputation methods. Instead of using observed total scores for grouping, the new indices rely on imputed total scores by either a single imputation method or three multiple imputation methods (i.e., two-way with normally distributed errors, corrected item-mean substitution with normally distributed errors and response function imputation). The new indices are equivalent to S-X 2 and S-G 2 when response data are complete. Their performances are evaluated and compared via simulation studies; the manipulated factors include test length, sources of misfit, misfit proportion, and missing proportion. The results from simulation studies are consistent with those of Orlando and Thissen (2000, 2003), and different indices are recommended under different conditions.","PeriodicalId":48300,"journal":{"name":"Applied Psychological Measurement","volume":"46 8","pages":"705-719"},"PeriodicalIF":1.2,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9574083/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"40656646","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Diagnostic Classification Models for a Mixture of Ordered and Non-ordered Response Options in Rating Scales. 评级量表中有序和无序反应选项混合物的诊断分类模型。

IF 1 4区心理学 Q4 PSYCHOLOGY, MATHEMATICAL

Applied Psychological Measurement

Pub Date : 2022-10-01 Epub Date: 2022-06-24 DOI: 10.1177/01466216221108132

Ren Liu, Haiyan Liu, Dexin Shi, Zhehan Jiang

When developing ordinal rating scales, we may include potentially unordered response options such as "Neither Agree nor Disagree," "Neutral," "Don't Know," "No Opinion," or "Hard to Say." To handle responses to a mixture of ordered and unordered options, Huggins-Manley et al. (2018) proposed a class of semi-ordered models under the unidimensional item response theory framework. This study extends the concept of semi-ordered models into the area of diagnostic classification models. Specifically, we propose a flexible framework of semi-ordered DCMs that accommodates most earlier DCMs and allows for analyzing the relationship between those potentially unordered responses and the measured traits. Results from an operational study and two simulation studies show that the proposed framework can incorporate both ordered and non-ordered responses into the estimation of the latent traits and thus provide useful information about both the items and the respondents.

在编制顺序评分量表时，我们可能会包含一些潜在的无序回答选项，如 "既不同意也不反对"、"中立"、"不知道"、"无观点 "或 "很难说"。为了处理有序和无序选项混合的回答，Huggins-Manley 等人（2018）在单维项目反应理论框架下提出了一类半有序模型。本研究将半有序模型的概念扩展到诊断分类模型领域。具体来说，我们提出了一个灵活的半有序 DCM 框架，该框架可容纳大多数早期的 DCM，并允许分析那些潜在的无序反应与测量特质之间的关系。一项操作研究和两项模拟研究的结果表明，所提出的框架可以将有序和无序的回答都纳入潜在特质的估计中，从而提供有关项目和被调查者的有用信息。

引用次数: 0

The Optimal Design of Bifactor Multidimensional Computerized Adaptive Testing with Mixed-format Items. 具有混合格式项目的双因子多维计算机自适应测试的优化设计。

IF 1.2 4区心理学 Q4 PSYCHOLOGY, MATHEMATICAL

Applied Psychological Measurement

Pub Date : 2022-10-01 Epub Date: 2022-06-14 DOI: 10.1177/01466216221108382

Xiuzhen Mao, Jiahui Zhang, Tao Xin

Multidimensional computerized adaptive testing (MCAT) using mixed-format items holds great potential for the next-generation assessments. Two critical factors in the mixed-format test design (i.e., the order and proportion of polytomous items) and item selection were addressed in the context of mixed-format bifactor MCAT. For item selection, this article presents the derivation of the Fisher information matrix of the bifactor graded response model and the application of the bifactor dimension reduction method to simplify the computation of the mutual information (MI) item selection method. In a simulation study, different MCAT designs were compared with varying proportions of polytomous items (0.2-0.6, 1), different item-delivering formats (DPmix: delivering polytomous items at the final stage; RPmix: random delivering), three bifactor patterns (low, middle, and high), and two item selection methods (Bayesian D-optimality and MI). Simulation results suggested that a) the overall estimation precision increased with a higher bifactor pattern; b) the two item selection methods did not show substantial differences in estimation precision; and c) the RPmix format always led to more precise interim and final estimates than the DPmix format. The proportions of 0.3 and 0.4 were recommended for the RPmix and DPmix formats, respectively.

使用混合格式项目的多维计算机自适应测试（MCAT）在下一代评估中具有巨大的潜力。在混合格式双因子MCAT的背景下，讨论了混合格式测试设计中的两个关键因素（即多角形项目的顺序和比例）和项目选择。对于项目选择，本文推导了双因子分级响应模型的Fisher信息矩阵，并应用双因子降维方法简化了互信息（MI）项目选择方法的计算。在一项模拟研究中，将不同的MCAT设计与不同比例的多面体项目（0.2-0.6，1）、不同的项目交付格式（DPmix：在最后阶段交付多面体项目；RPmix：随机交付）、三种双因素模式（低、中、高）和两种项目选择方法（贝叶斯D-最优性和MI）进行了比较。仿真结果表明：a）双因子模式越高，整体估计精度越高；b）两种项目选择方法在估计精度上没有显著差异；以及c）RPmix格式总是导致比DPmix格式更精确的中期和最终估计。建议RPmix和DPmix格式分别使用0.3和0.4的比例。

{"title":"The Optimal Design of Bifactor Multidimensional Computerized Adaptive Testing with Mixed-format Items.","authors":"Xiuzhen Mao, Jiahui Zhang, Tao Xin","doi":"10.1177/01466216221108382","DOIUrl":"10.1177/01466216221108382","url":null,"abstract":"Multidimensional computerized adaptive testing (MCAT) using mixed-format items holds great potential for the next-generation assessments. Two critical factors in the mixed-format test design (i.e., the order and proportion of polytomous items) and item selection were addressed in the context of mixed-format bifactor MCAT. For item selection, this article presents the derivation of the Fisher information matrix of the bifactor graded response model and the application of the bifactor dimension reduction method to simplify the computation of the mutual information (MI) item selection method. In a simulation study, different MCAT designs were compared with varying proportions of polytomous items (0.2-0.6, 1), different item-delivering formats (DPmix: delivering polytomous items at the final stage; RPmix: random delivering), three bifactor patterns (low, middle, and high), and two item selection methods (Bayesian D-optimality and MI). Simulation results suggested that a) the overall estimation precision increased with a higher bifactor pattern; b) the two item selection methods did not show substantial differences in estimation precision; and c) the RPmix format always led to more precise interim and final estimates than the DPmix format. The proportions of 0.3 and 0.4 were recommended for the RPmix and DPmix formats, respectively.","PeriodicalId":48300,"journal":{"name":"Applied Psychological Measurement","volume":"46 7","pages":"605-621"},"PeriodicalIF":1.2,"publicationDate":"2022-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9483217/pdf/10.1177_01466216221108382.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"33466926","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Uncovering the Complexity of Item Position Effects in a Low-Stakes Testing Context. 揭示低风险测试环境中项目位置效应的复杂性。

IF 1.2 4区心理学 Q4 PSYCHOLOGY, MATHEMATICAL

Applied Psychological Measurement

Pub Date : 2022-10-01 Epub Date: 2022-07-04 DOI: 10.1177/01466216221108134

Thai Q Ong, Dena A Pastor

Previous researchers have only either adopted an item or examinee perspective to position effects, where they focused on exploring the relationships among position effects and item or examinee variables separately. Unlike previous researchers, we adopted an integrated perspective to position effects, where we focused on exploring the relationships among position effects, item variables, and examinee variables simultaneously. We evaluated the degree to which position effects on two separate low-stakes tests administered to two different samples were moderated by different item (item length, number of response options, mental taxation, and graphic) and examinee (effort, change in effort, and gender) variables. Items exhibited significant negative linear position effects on both tests, with the magnitude of the position effects varying from item to item. Longer items were more prone to position effects than shorter items; however, the level of mental taxation required to answer the item, the presence of a graphic, and the number of response options were not related to position effects. Examinee effort levels, change in effort patterns, and gender did not moderate the relationships among position effects and item features.

先前的研究人员只采用项目或受试者的视角来研究位置效应，他们专注于分别探索位置效应与项目或受测者变量之间的关系。与以往的研究人员不同，我们对位置效应采用了综合视角，重点同时探索位置效应、项目变量和受试者变量之间的关系。我们评估了对两个不同样本进行的两次单独的低风险测试中，不同项目（项目长度、回答选项数量、心理测试和图表）和受试者（努力、努力变化和性别）变量对立场影响的调节程度。项目在两项测试中都表现出显著的负线性位置效应，位置效应的大小因项目而异。较长的项目比较短的项目更容易产生位置效应；然而，回答该项目所需的心理税水平、图形的存在以及回答选项的数量与位置效应无关。检查努力程度、努力模式的变化和性别并没有调节位置效应和项目特征之间的关系。

{"title":"Uncovering the Complexity of Item Position Effects in a Low-Stakes Testing Context.","authors":"Thai Q Ong, Dena A Pastor","doi":"10.1177/01466216221108134","DOIUrl":"10.1177/01466216221108134","url":null,"abstract":"Previous researchers have only either adopted an item or examinee perspective to position effects, where they focused on exploring the relationships among position effects and item or examinee variables separately. Unlike previous researchers, we adopted an integrated perspective to position effects, where we focused on exploring the relationships among position effects, item variables, and examinee variables simultaneously. We evaluated the degree to which position effects on two separate low-stakes tests administered to two different samples were moderated by different item (item length, number of response options, mental taxation, and graphic) and examinee (effort, change in effort, and gender) variables. Items exhibited significant negative linear position effects on both tests, with the magnitude of the position effects varying from item to item. Longer items were more prone to position effects than shorter items; however, the level of mental taxation required to answer the item, the presence of a graphic, and the number of response options were not related to position effects. Examinee effort levels, change in effort patterns, and gender did not moderate the relationships among position effects and item features.","PeriodicalId":48300,"journal":{"name":"Applied Psychological Measurement","volume":"46 7","pages":"571-588"},"PeriodicalIF":1.2,"publicationDate":"2022-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9483218/pdf/10.1177_01466216221108134.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"33466447","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0