Pub Date : 2023-01-01Epub Date: 2022-09-09DOI: 10.1177/01466216221125177
Kuan-Yu Jin, Chia-Ling Hsu, Ming Ming Chiu, Po-Hsi Chen
In traditional test models, test items are independent, and test-takers slowly and thoughtfully respond to each test item. However, some test items have a common stimulus (dependent test items in a testlet), and sometimes test-takers lack motivation, knowledge, or time (speededness), so they perform rapid guessing (RG). Ignoring the dependence in responses to testlet items can negatively bias standard errors of measurement, and ignoring RG by fitting a simpler item response theory (IRT) model can bias the results. Because computer-based testing captures response times on testlet responses, we propose a mixture testlet IRT model with item responses and response time to model RG behaviors in computer-based testlet items. Two simulation studies with Markov chain Monte Carlo estimation using the JAGS program showed (a) good recovery of the item and person parameters in this new model and (b) the harmful consequences of ignoring RG (biased parameter estimates: overestimated item difficulties, underestimated time intensities, underestimated respondent latent speed parameters, and overestimated precision of respondent latent estimates). The application of IRT models with and without RG to data from a computer-based language test showed parameter differences resembling those in the simulations.
{"title":"Modeling Rapid Guessing Behaviors in Computer-Based Testlet Items.","authors":"Kuan-Yu Jin, Chia-Ling Hsu, Ming Ming Chiu, Po-Hsi Chen","doi":"10.1177/01466216221125177","DOIUrl":"10.1177/01466216221125177","url":null,"abstract":"<p><p>In traditional test models, test items are independent, and test-takers slowly and thoughtfully respond to each test item. However, some test items have a common stimulus (dependent test items in a testlet), and sometimes test-takers lack motivation, knowledge, or time (speededness), so they perform rapid guessing (RG). Ignoring the dependence in responses to testlet items can negatively bias standard errors of measurement, and ignoring RG by fitting a simpler item response theory (IRT) model can bias the results. Because computer-based testing captures response times on testlet responses, we propose a mixture testlet IRT model with item responses and response time to model RG behaviors in computer-based testlet items. Two simulation studies with Markov chain Monte Carlo estimation using the JAGS program showed (a) good recovery of the item and person parameters in this new model and (b) the harmful consequences of ignoring RG (biased parameter estimates: overestimated item difficulties, underestimated time intensities, underestimated respondent latent speed parameters, and overestimated precision of respondent latent estimates). The application of IRT models with and without RG to data from a computer-based language test showed parameter differences resembling those in the simulations.</p>","PeriodicalId":48300,"journal":{"name":"Applied Psychological Measurement","volume":"47 1","pages":"19-33"},"PeriodicalIF":1.2,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9679923/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"40494726","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-11-01Epub Date: 2022-09-08DOI: 10.1177/01466216221123981
Chen-Wei Liu
The expectation-maximization (EM) algorithm is a commonly used technique for the parameter estimation of the diagnostic classification models (DCMs) with a prespecified Q-matrix; however, it requires O(2 K ) calculations in its expectation-step, which significantly slows down the computation when the number of attributes, K, is large. This study proposes an efficient Metropolis-Hastings Robbins-Monro (eMHRM) algorithm, needing only O(K + 1) calculations in the Monte Carlo expectation step. Furthermore, the item parameters and structural parameters are approximated via the Robbins-Monro algorithm, which does not require time-consuming nonlinear optimization procedures. A series of simulation studies were conducted to compare the eMHRM with the EM and a Metropolis-Hastings (MH) algorithm regarding the parameter recovery and execution time. The outcomes presented in this article reveal that the eMHRM is much more computationally efficient than the EM and MH, and it tends to produce better estimates than the EM when K is large, suggesting that the eMHRM is a promising parameter estimation method for high-dimensional DCMs.
{"title":"Efficient Metropolis-Hastings Robbins-Monro Algorithm for High-Dimensional Diagnostic Classification Models.","authors":"Chen-Wei Liu","doi":"10.1177/01466216221123981","DOIUrl":"10.1177/01466216221123981","url":null,"abstract":"<p><p>The expectation-maximization (EM) algorithm is a commonly used technique for the parameter estimation of the diagnostic classification models (DCMs) with a prespecified Q-matrix; however, it requires <i>O</i>(2 <sup><i>K</i></sup> ) calculations in its expectation-step, which significantly slows down the computation when the number of attributes, <i>K</i>, is large. This study proposes an efficient Metropolis-Hastings Robbins-Monro (eMHRM) algorithm, needing only <i>O</i>(<i>K</i> + 1) calculations in the Monte Carlo expectation step. Furthermore, the item parameters and structural parameters are approximated via the Robbins-Monro algorithm, which does not require time-consuming nonlinear optimization procedures. A series of simulation studies were conducted to compare the eMHRM with the EM and a Metropolis-Hastings (MH) algorithm regarding the parameter recovery and execution time. The outcomes presented in this article reveal that the eMHRM is much more computationally efficient than the EM and MH, and it tends to produce better estimates than the EM when <i>K</i> is large, suggesting that the eMHRM is a promising parameter estimation method for high-dimensional DCMs.</p>","PeriodicalId":48300,"journal":{"name":"Applied Psychological Measurement","volume":"46 8","pages":"662-674"},"PeriodicalIF":1.2,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9574082/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"40656644","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-11-01Epub Date: 2022-09-15DOI: 10.1177/01466216221108131
Jari Metsämuuronen
The estimates of reliability are usually attenuated and deflated because the item-score correlation ( , Rit) embedded in the most widely used estimators is affected by several sources of mechanical error in the estimation. Empirical examples show that, in some types of datasets, the estimates by traditional alpha may be deflated by 0.40-0.60 units of reliability and those by maximal reliability by 0.40 units of reliability. This article proposes a new kind of estimator of correlation: attenuation-corrected correlation (RAC ): the proportion of observed correlation with the maximal possible correlation reachable by the given item and score. By replacing with RAC in known formulas of estimators of reliability, we get attenuation-corrected alpha, theta, omega, and maximal reliability which all belong to a family of so-called deflation-corrected estimators of reliability.
可靠性的估计通常是衰减和收缩的,因为在最广泛使用的估计中嵌入的项目得分相关性(ρ g X, Rit)受到估计中几个机械误差来源的影响。实证表明,在某些类型的数据集中,传统的alpha估计可能会减少0.40-0.60个单位的信度,而最大信度估计可能会减少0.40个单位的信度。本文提出了一种新的相关估计量:衰减校正相关(attenuation-corrected correlation, R AC),即观测到的相关与给定项目和得分所能达到的最大可能相关的比例。通过在已知的可靠性估计公式中用rac代替ρ g X,我们得到衰减校正的α, θ,和最大可靠性,它们都属于所谓的紧缩校正的可靠性估计。
{"title":"Attenuation-Corrected Estimators of Reliability.","authors":"Jari Metsämuuronen","doi":"10.1177/01466216221108131","DOIUrl":"https://doi.org/10.1177/01466216221108131","url":null,"abstract":"<p><p>The estimates of reliability are usually attenuated and deflated because the item-score correlation ( <math> <mrow><msub><mi>ρ</mi> <mrow><mi>g</mi> <mi>X</mi></mrow> </msub> </mrow> </math> , <i>Rit</i>) embedded in the most widely used estimators is affected by several sources of mechanical error in the estimation. Empirical examples show that, in some types of datasets, the estimates by traditional alpha may be deflated by 0.40-0.60 units of reliability and those by maximal reliability by 0.40 units of reliability. This article proposes a new kind of estimator of correlation: attenuation-corrected correlation (<i>R</i> <sub><i>AC</i></sub> ): the proportion of observed correlation with the maximal possible correlation reachable by the given item and score. By replacing <math> <mrow><msub><mi>ρ</mi> <mrow><mi>g</mi> <mi>X</mi></mrow> </msub> </mrow> </math> with <i>R</i> <sub><i>AC</i></sub> in known formulas of estimators of reliability, we get attenuation-corrected alpha, theta, omega, and maximal reliability which all belong to a family of so-called deflation-corrected estimators of reliability.</p>","PeriodicalId":48300,"journal":{"name":"Applied Psychological Measurement","volume":"46 8","pages":"720-737"},"PeriodicalIF":1.2,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ftp.ncbi.nlm.nih.gov/pub/pmc/oa_pdf/66/7b/10.1177_01466216221108131.PMC9574086.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"40573822","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-11-01Epub Date: 2022-08-28DOI: 10.1177/01466216221124089
Jiaying Xiao, Okan Bulut
An important design feature in the implementation of both computerized adaptive testing and multistage adaptive testing is the use of an appropriate method for item selection. The item selection method is expected to select the most optimal items depending on the examinees' ability level while considering other design features (e.g., item exposure and item bank utilization). This study introduced collaborative filtering (CF) as a new method for item selection in the on-the-fly assembled multistage adaptive testing framework. The user-based CF (UBCF) and item-based CF (IBCF) methods were compared to the maximum Fisher information method based on the accuracy of ability estimation, item exposure rates, and item bank utilization under different test conditions (e.g., item bank size, test length, and the sparseness of training data). The simulation results indicated that the UBCF method outperformed the traditional item selection methods regarding measurement accuracy. Also, the IBCF method showed the most superior performance in terms of item bank utilization. Limitations of the current study and the directions for future research are discussed.
{"title":"Item Selection With Collaborative Filtering in On-The-Fly Multistage Adaptive Testing.","authors":"Jiaying Xiao, Okan Bulut","doi":"10.1177/01466216221124089","DOIUrl":"https://doi.org/10.1177/01466216221124089","url":null,"abstract":"<p><p>An important design feature in the implementation of both computerized adaptive testing and multistage adaptive testing is the use of an appropriate method for item selection. The item selection method is expected to select the most optimal items depending on the examinees' ability level while considering other design features (e.g., item exposure and item bank utilization). This study introduced collaborative filtering (CF) as a new method for item selection in the <i>on-the-fly assembled multistage adaptive testing</i> framework. The user-based CF (UBCF) and item-based CF (IBCF) methods were compared to the maximum Fisher information method based on the accuracy of ability estimation, item exposure rates, and item bank utilization under different test conditions (e.g., item bank size, test length, and the sparseness of training data). The simulation results indicated that the UBCF method outperformed the traditional item selection methods regarding measurement accuracy. Also, the IBCF method showed the most superior performance in terms of item bank utilization. Limitations of the current study and the directions for future research are discussed.</p>","PeriodicalId":48300,"journal":{"name":"Applied Psychological Measurement","volume":"46 8","pages":"690-704"},"PeriodicalIF":1.2,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ftp.ncbi.nlm.nih.gov/pub/pmc/oa_pdf/09/ba/10.1177_01466216221124089.PMC9574085.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"40656645","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-11-01Epub Date: 2022-08-07DOI: 10.1177/01466216221108124
Gerhard Tutz
A new item response theory model for count data is introduced. In contrast to models in common use, it does not assume a fixed distribution for the responses as, for example, the Poisson count model and extensions do. The distribution of responses is determined by difficulty functions which reflect the characteristics of items in a flexible way. Sparse parameterizations are obtained by choosing fixed parametric difficulty functions, more general versions use an approximation by basis functions. The model can be seen as constructed from binary response models as the Rasch model or the normal-ogive model to which it reduces if responses are dichotomized. It is demonstrated that the model competes well with advanced count data models. Simulations demonstrate that parameters and response distributions are recovered well. An application shows the flexibility of the model to account for strongly varying distributions of responses.
{"title":"Flexible Item Response Models for Count Data: The Count Thresholds Model.","authors":"Gerhard Tutz","doi":"10.1177/01466216221108124","DOIUrl":"10.1177/01466216221108124","url":null,"abstract":"<p><p>A new item response theory model for count data is introduced. In contrast to models in common use, it does not assume a fixed distribution for the responses as, for example, the Poisson count model and extensions do. The distribution of responses is determined by difficulty functions which reflect the characteristics of items in a flexible way. Sparse parameterizations are obtained by choosing fixed parametric difficulty functions, more general versions use an approximation by basis functions. The model can be seen as constructed from binary response models as the Rasch model or the normal-ogive model to which it reduces if responses are dichotomized. It is demonstrated that the model competes well with advanced count data models. Simulations demonstrate that parameters and response distributions are recovered well. An application shows the flexibility of the model to account for strongly varying distributions of responses.</p>","PeriodicalId":48300,"journal":{"name":"Applied Psychological Measurement","volume":"46 8","pages":"643-661"},"PeriodicalIF":1.2,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9574081/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"40573824","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-11-01Epub Date: 2022-07-10DOI: 10.1177/01466216221108133
Wenya Chen, Ken A Fujimoto
Using the bifactor item response theory model to analyze data arising from educational and psychological studies has gained popularity over the years. Unfortunately, using this model in practice comes with challenges. One such challenge is an empirical identification issue that is seldom discussed in the literature, and its impact on the estimates of the bifactor model's parameters has not been demonstrated. This issue occurs when an item's discriminations on the general and specific dimensions are approximately equal (i.e., the within-item discriminations are similar in strength), leading to difficulties in obtaining unique estimates for those discriminations. We conducted three simulation studies to demonstrate that within-item discriminations being similar in strength creates problems in estimation stability. The results suggest that a large sample could alleviate but not resolve the problems, at least when considering sample sizes up to 4,000. When the discriminations within items were made clearly different, the estimates of these discriminations were more consistent across the data replicates than that observed when the discriminations within the items were similar. The results also show that the similarity of an item's discriminatory magnitudes on different dimensions has direct implications on the sample size needed in order to consistently obtain accurate parameter estimates. Although our goal was to provide evidence of the empirical identification issue, the study further reveals that the extent of similarity of within-item discriminations, the magnitude of discriminations, and how well the items are targeted to the respondents also play factors in the estimation of the bifactor model's parameters.
{"title":"An Empirical Identification Issue of the Bifactor Item Response Theory Model.","authors":"Wenya Chen, Ken A Fujimoto","doi":"10.1177/01466216221108133","DOIUrl":"10.1177/01466216221108133","url":null,"abstract":"<p><p>Using the bifactor item response theory model to analyze data arising from educational and psychological studies has gained popularity over the years. Unfortunately, using this model in practice comes with challenges. One such challenge is an empirical identification issue that is seldom discussed in the literature, and its impact on the estimates of the bifactor model's parameters has not been demonstrated. This issue occurs when an item's discriminations on the general and specific dimensions are approximately equal (i.e., the within-item discriminations are similar in strength), leading to difficulties in obtaining unique estimates for those discriminations. We conducted three simulation studies to demonstrate that within-item discriminations being similar in strength creates problems in estimation stability. The results suggest that a large sample could alleviate but not resolve the problems, at least when considering sample sizes up to 4,000. When the discriminations within items were made clearly different, the estimates of these discriminations were more consistent across the data replicates than that observed when the discriminations within the items were similar. The results also show that the similarity of an item's discriminatory magnitudes on different dimensions has direct implications on the sample size needed in order to consistently obtain accurate parameter estimates. Although our goal was to provide evidence of the empirical identification issue, the study further reveals that the extent of similarity of within-item discriminations, the magnitude of discriminations, and how well the items are targeted to the respondents also play factors in the estimation of the bifactor model's parameters.</p>","PeriodicalId":48300,"journal":{"name":"Applied Psychological Measurement","volume":"46 8","pages":"675-689"},"PeriodicalIF":1.2,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9574084/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"40656647","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-11-01Epub Date: 2022-09-19DOI: 10.1177/01466216221125176
Xue Zhang, Chun Wang
Item-level fit analysis not only serves as a complementary check to global fit analysis, it is also essential in scale development because the fit results will guide item revision and/or deletion (Liu & Maydeu-Olivares, 2014). During data collection, missing response data may likely happen due to various reasons. Chi-square-based item fit indices (e.g., Yen's Q1 , McKinley and Mill's G2 , Orlando and Thissen's S-X2 and S-G2 ) are the most widely used statistics to assess item-level fit. However, the role of total scores with complete data used in S-X2 and S-G2 is different from that with incomplete data. As a result, S-X2 and S-G2 cannot handle incomplete data directly. To this end, we propose several modified versions of S-X2 and S-G2 to evaluate item-level fit when response data are incomplete, named as Mimpute-X2 and Mimpute-G2 , of which the subscript "impute" denotes different imputation methods. Instead of using observed total scores for grouping, the new indices rely on imputed total scores by either a single imputation method or three multiple imputation methods (i.e., two-way with normally distributed errors, corrected item-mean substitution with normally distributed errors and response function imputation). The new indices are equivalent to S-X2 and S-G2 when response data are complete. Their performances are evaluated and compared via simulation studies; the manipulated factors include test length, sources of misfit, misfit proportion, and missing proportion. The results from simulation studies are consistent with those of Orlando and Thissen (2000, 2003), and different indices are recommended under different conditions.
{"title":"Modified Item-Fit Indices for Dichotomous IRT Models with Missing Data.","authors":"Xue Zhang, Chun Wang","doi":"10.1177/01466216221125176","DOIUrl":"10.1177/01466216221125176","url":null,"abstract":"<p><p>Item-level fit analysis not only serves as a complementary check to global fit analysis, it is also essential in scale development because the fit results will guide item revision and/or deletion (Liu & Maydeu-Olivares, 2014). During data collection, missing response data may likely happen due to various reasons. Chi-square-based item fit indices (e.g., Yen's <i>Q</i> <sub><i>1</i></sub> , McKinley and Mill's <i>G</i> <sup><i>2</i></sup> , Orlando and Thissen's <i>S-X</i> <sup><i>2</i></sup> and <i>S-G</i> <sup><i>2</i></sup> ) are the most widely used statistics to assess item-level fit. However, the role of total scores with complete data used in <i>S-X</i> <sup><i>2</i></sup> and <i>S-G</i> <sup><i>2</i></sup> is different from that with incomplete data. As a result, <i>S-X</i> <sup><i>2</i></sup> and <i>S-G</i> <sup><i>2</i></sup> cannot handle incomplete data directly. To this end, we propose several modified versions of <i>S-X</i> <sup><i>2</i></sup> and <i>S-G</i> <sup><i>2</i></sup> to evaluate item-level fit when response data are incomplete, named as <i>M</i> <sub><i>impute</i></sub> <i>-X</i> <sup><i>2</i></sup> and <i>M</i> <sub><i>impute</i></sub> <i>-G</i> <sup><i>2</i></sup> , of which the subscript \"<i>impute</i>\" denotes different imputation methods. Instead of using observed total scores for grouping, the new indices rely on imputed total scores by either a single imputation method or three multiple imputation methods (i.e., two-way with normally distributed errors, corrected item-mean substitution with normally distributed errors and response function imputation). The new indices are equivalent to <i>S-X</i> <sup><i>2</i></sup> and <i>S-G</i> <sup><i>2</i></sup> when response data are complete. Their performances are evaluated and compared via simulation studies; the manipulated factors include test length, sources of misfit, misfit proportion, and missing proportion. The results from simulation studies are consistent with those of Orlando and Thissen (2000, 2003), and different indices are recommended under different conditions.</p>","PeriodicalId":48300,"journal":{"name":"Applied Psychological Measurement","volume":"46 8","pages":"705-719"},"PeriodicalIF":1.2,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9574083/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"40656646","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-10-01Epub Date: 2022-06-24DOI: 10.1177/01466216221108132
Ren Liu, Haiyan Liu, Dexin Shi, Zhehan Jiang
When developing ordinal rating scales, we may include potentially unordered response options such as "Neither Agree nor Disagree," "Neutral," "Don't Know," "No Opinion," or "Hard to Say." To handle responses to a mixture of ordered and unordered options, Huggins-Manley et al. (2018) proposed a class of semi-ordered models under the unidimensional item response theory framework. This study extends the concept of semi-ordered models into the area of diagnostic classification models. Specifically, we propose a flexible framework of semi-ordered DCMs that accommodates most earlier DCMs and allows for analyzing the relationship between those potentially unordered responses and the measured traits. Results from an operational study and two simulation studies show that the proposed framework can incorporate both ordered and non-ordered responses into the estimation of the latent traits and thus provide useful information about both the items and the respondents.
{"title":"Diagnostic Classification Models for a Mixture of Ordered and Non-ordered Response Options in Rating Scales.","authors":"Ren Liu, Haiyan Liu, Dexin Shi, Zhehan Jiang","doi":"10.1177/01466216221108132","DOIUrl":"10.1177/01466216221108132","url":null,"abstract":"<p><p>When developing ordinal rating scales, we may include potentially unordered response options such as \"Neither Agree nor Disagree,\" \"Neutral,\" \"Don't Know,\" \"No Opinion,\" or \"Hard to Say.\" To handle responses to a mixture of ordered and unordered options, Huggins-Manley et al. (2018) proposed a class of semi-ordered models under the unidimensional item response theory framework. This study extends the concept of semi-ordered models into the area of diagnostic classification models. Specifically, we propose a flexible framework of semi-ordered DCMs that accommodates most earlier DCMs and allows for analyzing the relationship between those potentially unordered responses and the measured traits. Results from an operational study and two simulation studies show that the proposed framework can incorporate both ordered and non-ordered responses into the estimation of the latent traits and thus provide useful information about both the items and the respondents.</p>","PeriodicalId":48300,"journal":{"name":"Applied Psychological Measurement","volume":"46 7","pages":"622-639"},"PeriodicalIF":1.0,"publicationDate":"2022-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ftp.ncbi.nlm.nih.gov/pub/pmc/oa_pdf/f0/84/10.1177_01466216221108132.PMC9483220.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"33466446","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-10-01Epub Date: 2022-06-14DOI: 10.1177/01466216221108382
Xiuzhen Mao, Jiahui Zhang, Tao Xin
Multidimensional computerized adaptive testing (MCAT) using mixed-format items holds great potential for the next-generation assessments. Two critical factors in the mixed-format test design (i.e., the order and proportion of polytomous items) and item selection were addressed in the context of mixed-format bifactor MCAT. For item selection, this article presents the derivation of the Fisher information matrix of the bifactor graded response model and the application of the bifactor dimension reduction method to simplify the computation of the mutual information (MI) item selection method. In a simulation study, different MCAT designs were compared with varying proportions of polytomous items (0.2-0.6, 1), different item-delivering formats (DPmix: delivering polytomous items at the final stage; RPmix: random delivering), three bifactor patterns (low, middle, and high), and two item selection methods (Bayesian D-optimality and MI). Simulation results suggested that a) the overall estimation precision increased with a higher bifactor pattern; b) the two item selection methods did not show substantial differences in estimation precision; and c) the RPmix format always led to more precise interim and final estimates than the DPmix format. The proportions of 0.3 and 0.4 were recommended for the RPmix and DPmix formats, respectively.
{"title":"The Optimal Design of Bifactor Multidimensional Computerized Adaptive Testing with Mixed-format Items.","authors":"Xiuzhen Mao, Jiahui Zhang, Tao Xin","doi":"10.1177/01466216221108382","DOIUrl":"10.1177/01466216221108382","url":null,"abstract":"<p><p>Multidimensional computerized adaptive testing (MCAT) using mixed-format items holds great potential for the next-generation assessments. Two critical factors in the mixed-format test design (i.e., the order and proportion of polytomous items) and item selection were addressed in the context of mixed-format bifactor MCAT. For item selection, this article presents the derivation of the Fisher information matrix of the bifactor graded response model and the application of the bifactor dimension reduction method to simplify the computation of the mutual information (MI) item selection method. In a simulation study, different MCAT designs were compared with varying proportions of polytomous items (0.2-0.6, 1), different item-delivering formats (DPmix: delivering polytomous items at the final stage; RPmix: random delivering), three bifactor patterns (low, middle, and high), and two item selection methods (Bayesian D-optimality and MI). Simulation results suggested that a) the overall estimation precision increased with a higher bifactor pattern; b) the two item selection methods did not show substantial differences in estimation precision; and c) the RPmix format always led to more precise interim and final estimates than the DPmix format. The proportions of 0.3 and 0.4 were recommended for the RPmix and DPmix formats, respectively.</p>","PeriodicalId":48300,"journal":{"name":"Applied Psychological Measurement","volume":"46 7","pages":"605-621"},"PeriodicalIF":1.2,"publicationDate":"2022-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9483217/pdf/10.1177_01466216221108382.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"33466926","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-10-01Epub Date: 2022-07-04DOI: 10.1177/01466216221108134
Thai Q Ong, Dena A Pastor
Previous researchers have only either adopted an item or examinee perspective to position effects, where they focused on exploring the relationships among position effects and item or examinee variables separately. Unlike previous researchers, we adopted an integrated perspective to position effects, where we focused on exploring the relationships among position effects, item variables, and examinee variables simultaneously. We evaluated the degree to which position effects on two separate low-stakes tests administered to two different samples were moderated by different item (item length, number of response options, mental taxation, and graphic) and examinee (effort, change in effort, and gender) variables. Items exhibited significant negative linear position effects on both tests, with the magnitude of the position effects varying from item to item. Longer items were more prone to position effects than shorter items; however, the level of mental taxation required to answer the item, the presence of a graphic, and the number of response options were not related to position effects. Examinee effort levels, change in effort patterns, and gender did not moderate the relationships among position effects and item features.
{"title":"Uncovering the Complexity of Item Position Effects in a Low-Stakes Testing Context.","authors":"Thai Q Ong, Dena A Pastor","doi":"10.1177/01466216221108134","DOIUrl":"10.1177/01466216221108134","url":null,"abstract":"<p><p>Previous researchers have only either adopted an item or examinee perspective to position effects, where they focused on exploring the relationships among position effects and item or examinee variables separately. Unlike previous researchers, we adopted an integrated perspective to position effects, where we focused on exploring the relationships among position effects, item variables, and examinee variables simultaneously. We evaluated the degree to which position effects on two separate low-stakes tests administered to two different samples were moderated by different item (item length, number of response options, mental taxation, and graphic) and examinee (effort, change in effort, and gender) variables. Items exhibited significant negative linear position effects on both tests, with the magnitude of the position effects varying from item to item. Longer items were more prone to position effects than shorter items; however, the level of mental taxation required to answer the item, the presence of a graphic, and the number of response options were not related to position effects. Examinee effort levels, change in effort patterns, and gender did not moderate the relationships among position effects and item features.</p>","PeriodicalId":48300,"journal":{"name":"Applied Psychological Measurement","volume":"46 7","pages":"571-588"},"PeriodicalIF":1.2,"publicationDate":"2022-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9483218/pdf/10.1177_01466216221108134.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"33466447","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}