Pub Date : 2022-10-01Epub Date: 2022-06-16DOI: 10.1177/01466216221108383
Zhuoran Wang, Chun Wang, David J Weiss
Adaptive classification testing (ACT) is a variation of computerized adaptive testing (CAT) that is developed to efficiently classify examinees into multiple groups based on predetermined cutoffs. In multidimensional multiclassification (i.e., more than two categories exist along each dimension), grid classification is proposed to classify each examinee into one of the grids encircled by cutoffs (lines/surfaces) along different dimensions so as to provide clearer information regarding an examinee's relative standing along each dimension and facilitate subsequent treatment and intervention. In this article, the sequential probability ratio test (SPRT) and confidence interval method were implemented in the grid multiclassification ACT. In addition, two new termination criteria, the grid classification generalized likelihood ratio (GGLR) and simplified grid classification generalized likelihood ratio were proposed for grid multiclassification ACT. Simulation studies, using a simulated item bank, and a real item bank with polytomous multidimensional items, show that grid multiclassification ACT is more efficient than classification based on measurement CAT that focuses on trait estimate precision. In the context of a high-quality bank, GGLR was found to most efficiently terminate the grid multiclassification ACT and classify examinees.
{"title":"Termination Criteria for Grid Multiclassification Adaptive Testing With Multidimensional Polytomous Items.","authors":"Zhuoran Wang, Chun Wang, David J Weiss","doi":"10.1177/01466216221108383","DOIUrl":"10.1177/01466216221108383","url":null,"abstract":"<p><p>Adaptive classification testing (ACT) is a variation of computerized adaptive testing (CAT) that is developed to efficiently classify examinees into multiple groups based on predetermined cutoffs. In multidimensional multiclassification (i.e., more than two categories exist along each dimension), grid classification is proposed to classify each examinee into one of the grids encircled by cutoffs (lines/surfaces) along different dimensions so as to provide clearer information regarding an examinee's relative standing along each dimension and facilitate subsequent treatment and intervention. In this article, the sequential probability ratio test (SPRT) and confidence interval method were implemented in the grid multiclassification ACT. In addition, two new termination criteria, the grid classification generalized likelihood ratio (GGLR) and simplified grid classification generalized likelihood ratio were proposed for grid multiclassification ACT. Simulation studies, using a simulated item bank, and a real item bank with polytomous multidimensional items, show that grid multiclassification ACT is more efficient than classification based on measurement CAT that focuses on trait estimate precision. In the context of a high-quality bank, GGLR was found to most efficiently terminate the grid multiclassification ACT and classify examinees.</p>","PeriodicalId":48300,"journal":{"name":"Applied Psychological Measurement","volume":"46 7","pages":"551-570"},"PeriodicalIF":1.0,"publicationDate":"2022-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9483219/pdf/10.1177_01466216221108383.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"33466449","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-10-01Epub Date: 2022-06-16DOI: 10.1177/01466216221108991
Jiayi Deng, Joseph A Rios
Score equating is an essential tool in improving the fairness of test score interpretations when employing multiple test forms. To ensure that the equating functions used to connect scores from one form to another are valid, they must be invariant across different populations of examinees. Given that equating is used in many low-stakes testing programs, examinees' test-taking effort should be considered carefully when evaluating population invariance in equating, particularly as the occurrence of rapid guessing (RG) has been found to differ across subgroups. To this end, the current study investigated whether differential RG rates between subgroups can lead to incorrect inferences concerning population invariance in test equating. A simulation was built to generate data for two examinee subgroups (one more motivated than the other) administered two alternative forms of multiple-choice items. The rate of RG and ability characteristics of rapid guessers were manipulated. Results showed that as RG responses increased, false positive and false negative inferences of equating invariance were respectively observed at the lower and upper ends of the observed score scale. This result was supported by an empirical analysis of an international assessment. These findings suggest that RG should be investigated and documented prior to test equating, especially in low-stakes assessment contexts. A failure to do so may lead to incorrect inferences concerning fairness in equating.
{"title":"Investigating the Effect of Differential Rapid Guessing on Population Invariance in Equating.","authors":"Jiayi Deng, Joseph A Rios","doi":"10.1177/01466216221108991","DOIUrl":"10.1177/01466216221108991","url":null,"abstract":"<p><p>Score equating is an essential tool in improving the fairness of test score interpretations when employing multiple test forms. To ensure that the equating functions used to connect scores from one form to another are valid, they must be invariant across different populations of examinees. Given that equating is used in many low-stakes testing programs, examinees' test-taking effort should be considered carefully when evaluating population invariance in equating, particularly as the occurrence of rapid guessing (RG) has been found to differ across subgroups. To this end, the current study investigated whether differential RG rates between subgroups can lead to incorrect inferences concerning population invariance in test equating. A simulation was built to generate data for two examinee subgroups (one more motivated than the other) administered two alternative forms of multiple-choice items. The rate of RG and ability characteristics of rapid guessers were manipulated. Results showed that as RG responses increased, false positive and false negative inferences of equating invariance were respectively observed at the lower and upper ends of the observed score scale. This result was supported by an empirical analysis of an international assessment. These findings suggest that RG should be investigated and documented prior to test equating, especially in low-stakes assessment contexts. A failure to do so may lead to incorrect inferences concerning fairness in equating.</p>","PeriodicalId":48300,"journal":{"name":"Applied Psychological Measurement","volume":"46 7","pages":"589-604"},"PeriodicalIF":1.0,"publicationDate":"2022-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9483216/pdf/10.1177_01466216221108991.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"33466450","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-09-01DOI: 10.1177/01466216221108123
Leslie Rutkowski, Yuan-Ling Liaw, Dubravka Svetina, David Rutkowski
A central challenge in international large-scale assessments is adequately measuring dozens of highly heterogeneous populations, many of which are low performers. To that end, multistage adaptive testing offers one possibility for better assessing across the achievement continuum. This study examines the way that several multistage test design and implementation choices can impact measurement performance in this setting. To attend to gaps in the knowledge base, we extended previous research to include multiple, linked panels, more appropriate estimates of achievement, and multiple populations of varied proficiency. Including achievement distributions from varied populations and associated item parameters, we design and execute a simulation study that mimics an established international assessment. We compare several routing schemes and varied module lengths in terms of item and person parameter recovery. Our findings suggest that, particularly for low performing populations, multistage testing offers precision advantages. Further, findings indicate that equal module lengths-desirable for controlling position effects-and classical routing methods, which lower the technological burden of implementing such a design, produce good results. Finally, probabilistic misrouting offers advantages over merit routing for controlling bias in item and person parameters. Overall, multistage testing shows promise for extending the scope of international assessments. We discuss the importance of our findings for operational work in the international assessment domain.
{"title":"Multistage Testing in Heterogeneous Populations: Some Design and Implementation Considerations.","authors":"Leslie Rutkowski, Yuan-Ling Liaw, Dubravka Svetina, David Rutkowski","doi":"10.1177/01466216221108123","DOIUrl":"https://doi.org/10.1177/01466216221108123","url":null,"abstract":"<p><p>A central challenge in international large-scale assessments is adequately measuring dozens of highly heterogeneous populations, many of which are low performers. To that end, multistage adaptive testing offers one possibility for better assessing across the achievement continuum. This study examines the way that several multistage test design and implementation choices can impact measurement performance in this setting. To attend to gaps in the knowledge base, we extended previous research to include multiple, linked panels, more appropriate estimates of achievement, and multiple populations of varied proficiency. Including achievement distributions from varied populations and associated item parameters, we design and execute a simulation study that mimics an established international assessment. We compare several routing schemes and varied module lengths in terms of item and person parameter recovery. Our findings suggest that, particularly for low performing populations, multistage testing offers precision advantages. Further, findings indicate that equal module lengths-desirable for controlling position effects-and classical routing methods, which lower the technological burden of implementing such a design, produce good results. Finally, probabilistic misrouting offers advantages over merit routing for controlling bias in item and person parameters. Overall, multistage testing shows promise for extending the scope of international assessments. We discuss the importance of our findings for operational work in the international assessment domain.</p>","PeriodicalId":48300,"journal":{"name":"Applied Psychological Measurement","volume":"46 6","pages":"494-508"},"PeriodicalIF":1.2,"publicationDate":"2022-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9382094/pdf/10.1177_01466216221108123.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10189453","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-09-01DOI: 10.1177/01466216221108136
Shuangshuang Xu, Yang Liu
A common practice of linking uses estimated item parameters to calculate projected scores. This procedure fails to account for the carry-over sampling variability. Neglecting sampling variability could consequently lead to understated uncertainty for Item Response Theory (IRT) scale scores. To address the issue, we apply a Multiple Imputation (MI) approach to adjust the Posterior Standard Deviations of IRT scale scores. The MI procedure involves drawing multiple sets of plausible values from an approximate sampling distribution of the estimated item parameters. When two scales to be linked were previously calibrated, item parameters can be fixed at their original published scales, and the latent variable means and covariances of the two scales can then be estimated conditional on the fixed item parameters. The conditional estimation procedure is a special case of Restricted Recalibration (RR), in which the asymptotic sampling distribution of estimated parameters follows from the general theory of pseudo Maximum Likelihood (ML) estimation. We evaluate the combination of RR and MI by a simulation study to examine the impact of carry-over sampling variability under various simulation conditions. We also illustrate how to apply the proposed method to real data by revisiting Thissen et al. (2015).
{"title":"Characterizing Sampling Variability for Item Response Theory Scale Scores in a Fixed-Parameter Calibrated Projection Design.","authors":"Shuangshuang Xu, Yang Liu","doi":"10.1177/01466216221108136","DOIUrl":"https://doi.org/10.1177/01466216221108136","url":null,"abstract":"<p><p>A common practice of linking uses estimated item parameters to calculate projected scores. This procedure fails to account for the carry-over sampling variability. Neglecting sampling variability could consequently lead to understated uncertainty for Item Response Theory (IRT) scale scores. To address the issue, we apply a Multiple Imputation (MI) approach to adjust the Posterior Standard Deviations of IRT scale scores. The MI procedure involves drawing multiple sets of plausible values from an approximate sampling distribution of the estimated item parameters. When two scales to be linked were previously calibrated, item parameters can be fixed at their original published scales, and the latent variable means and covariances of the two scales can then be estimated conditional on the fixed item parameters. The conditional estimation procedure is a special case of Restricted Recalibration (RR), in which the asymptotic sampling distribution of estimated parameters follows from the general theory of pseudo Maximum Likelihood (ML) estimation. We evaluate the combination of RR and MI by a simulation study to examine the impact of carry-over sampling variability under various simulation conditions. We also illustrate how to apply the proposed method to real data by revisiting Thissen et al. (2015).</p>","PeriodicalId":48300,"journal":{"name":"Applied Psychological Measurement","volume":"46 6","pages":"509-528"},"PeriodicalIF":1.2,"publicationDate":"2022-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9382091/pdf/10.1177_01466216221108136.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10133732","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-09-01DOI: 10.1177/01466216221108122
Chunyan Liu, Daniel Jurich
In common item equating, the existence of item outliers may impact the accuracy of equating results and bring significant ramifications to the validity of test score interpretations. Therefore, common item equating should involve a screening process to flag outlying items and exclude them from the common item set before equating is conducted. The current simulation study demonstrated that the sampling variance associated with the item response theory (IRT) item parameter estimates can help detect outliers in the common items under the 2-PL and 3-PL IRT models. The results showed the proposed sampling variance statistic (SV) outperformed the traditional displacement method with cutoff values of 0.3 and 0.5 along a variety of evaluation criteria. Based on the favorable results, item outlier detection statistics based on estimated sampling variability warrant further consideration in both research and practice.
{"title":"Application of Sampling Variance of Item Response Theory Parameter Estimates in Detecting Outliers in Common Item Equating.","authors":"Chunyan Liu, Daniel Jurich","doi":"10.1177/01466216221108122","DOIUrl":"https://doi.org/10.1177/01466216221108122","url":null,"abstract":"<p><p>In common item equating, the existence of item outliers may impact the accuracy of equating results and bring significant ramifications to the validity of test score interpretations. Therefore, common item equating should involve a screening process to flag outlying items and exclude them from the common item set before equating is conducted. The current simulation study demonstrated that the sampling variance associated with the item response theory (IRT) item parameter estimates can help detect outliers in the common items under the 2-PL and 3-PL IRT models. The results showed the proposed sampling variance statistic (<i>SV</i>) outperformed the traditional displacement method with cutoff values of 0.3 and 0.5 along a variety of evaluation criteria. Based on the favorable results, item outlier detection statistics based on estimated sampling variability warrant further consideration in both research and practice.</p>","PeriodicalId":48300,"journal":{"name":"Applied Psychological Measurement","volume":"46 6","pages":"529-547"},"PeriodicalIF":1.2,"publicationDate":"2022-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9382092/pdf/10.1177_01466216221108122.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10487809","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-09-01DOI: 10.1177/01466216221108130
Kylie Gorney, James A Wollack
To evaluate preknowledge detection methods, researchers often conduct simulation studies in which they use models to generate the data. In this article, we propose two new models to represent item preknowledge. Contrary to existing models, we allow the impact of preknowledge to vary across persons and items in order to better represent situations that are encountered in practice. We use three real data sets to evaluate the fit of the new models with respect to two types of preknowledge: items only, and items and the correct answer key. Results show that the two new models provide the best fit compared to several other existing preknowledge models. Furthermore, model parameter estimates were found to vary substantially depending on the type of preknowledge being considered, indicating that answer key disclosure has a profound impact on testing behavior.
{"title":"Two New Models for Item Preknowledge.","authors":"Kylie Gorney, James A Wollack","doi":"10.1177/01466216221108130","DOIUrl":"https://doi.org/10.1177/01466216221108130","url":null,"abstract":"<p><p>To evaluate preknowledge detection methods, researchers often conduct simulation studies in which they use models to generate the data. In this article, we propose two new models to represent item preknowledge. Contrary to existing models, we allow the impact of preknowledge to vary across persons and items in order to better represent situations that are encountered in practice. We use three real data sets to evaluate the fit of the new models with respect to two types of preknowledge: items only, and items and the correct answer key. Results show that the two new models provide the best fit compared to several other existing preknowledge models. Furthermore, model parameter estimates were found to vary substantially depending on the type of preknowledge being considered, indicating that answer key disclosure has a profound impact on testing behavior.</p>","PeriodicalId":48300,"journal":{"name":"Applied Psychological Measurement","volume":"46 6","pages":"447-461"},"PeriodicalIF":1.2,"publicationDate":"2022-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9382093/pdf/10.1177_01466216221108130.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10487814","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-09-01DOI: 10.1177/01466216221108061
Bartosz Kondratek
A novel approach to item-fit analysis based on an asymptotic test is proposed. The new test statistic, , compares pseudo-observed and expected item mean scores over a set of ability bins. The item mean scores are computed as weighted means with weights based on test-takers' a posteriori density of ability within the bin. This article explores the properties of in case of dichotomously scored items for unidimensional IRT models. Monte Carlo experiments were conducted to analyze the performance of . Type I error of was acceptably close to the nominal level and it had greater power than Orlando and Thissen's . Under some conditions, power of also exceeded the one reported for the computationally more demanding Stone's .
提出了一种基于渐近检验的项目拟合分析新方法。新的检验统计量χ w 2比较了一组能力箱上的伪观察和预期项目平均得分。项目平均得分以加权平均数计算,权重基于考生在bin中的后验能力密度。本文探讨了单维IRT模型中二分类得分项目的χ w 2的性质。通过蒙特卡罗实验对χ w2的性能进行了分析。χ w 2的I型误差可接受地接近名义水平,其功率大于Orlando和Thissen的S - x2。在某些条件下,χ w 2的幂也超过了计算要求更高的Stone的χ 2 *所报告的幂。
{"title":"Item-Fit Statistic Based on Posterior Probabilities of Membership in Ability Groups.","authors":"Bartosz Kondratek","doi":"10.1177/01466216221108061","DOIUrl":"https://doi.org/10.1177/01466216221108061","url":null,"abstract":"<p><p>A novel approach to item-fit analysis based on an asymptotic test is proposed. The new test statistic, <math> <mrow><msubsup><mi>χ</mi> <mi>w</mi> <mn>2</mn></msubsup> </mrow> </math> , compares pseudo-observed and expected item mean scores over a set of ability bins. The item mean scores are computed as weighted means with weights based on test-takers' <i>a posteriori</i> density of ability within the bin. This article explores the properties of <math> <mrow><msubsup><mi>χ</mi> <mi>w</mi> <mn>2</mn></msubsup> </mrow> </math> in case of dichotomously scored items for unidimensional IRT models. Monte Carlo experiments were conducted to analyze the performance of <math> <mrow><msubsup><mi>χ</mi> <mi>w</mi> <mn>2</mn></msubsup> </mrow> </math> . Type I error of <math> <mrow><msubsup><mi>χ</mi> <mi>w</mi> <mn>2</mn></msubsup> <mo> </mo></mrow> </math> was acceptably close to the nominal level and it had greater power than Orlando and Thissen's <math><mrow><mi>S</mi> <mo>-</mo> <msup><mi>x</mi> <mn>2</mn></msup> </mrow> </math> . Under some conditions, power of <math> <mrow><msubsup><mi>χ</mi> <mi>w</mi> <mn>2</mn></msubsup> </mrow> </math> also exceeded the one reported for the computationally more demanding Stone's <math> <mrow><msup><mi>χ</mi> <mrow><mn>2</mn> <mo>∗</mo></mrow> </msup> </mrow> </math> .</p>","PeriodicalId":48300,"journal":{"name":"Applied Psychological Measurement","volume":"46 6","pages":"462-478"},"PeriodicalIF":1.2,"publicationDate":"2022-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9382089/pdf/10.1177_01466216221108061.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10132911","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-09-01Epub Date: 2022-06-17DOI: 10.1177/01466216221108995
Kyung Yong Kim
Applying item response theory (IRT) true score equating to multidimensional IRT models is not straightforward due to the one-to-many relationship between a true score and latent variables. Under the common-item nonequivalent groups design, the purpose of the current study was to introduce two IRT true score equating procedures that adopted different dimension reduction strategies for the bifactor model. The first procedure, which was referred to as the integration procedure, linked the latent variable scales for the bifactor model and integrated out the specific factors from the item response function of the bifactor model. Then, IRT true score equating was applied to the marginalized bifactor model. The second procedure, which was referred to as the PIRT-based procedure, projected the specific dimensions onto the general dimension to obtain a locally dependent unidimensional IRT (UIRT) model and linked the scales of the UIRT model, followed by the application of IRT true score equating to the locally dependent UIRT model. Equating results obtained with the two equating procedures along with those obtained with the unidimensional three-parameter logistic (3PL) model were compared using both simulated and real data. In general, the integration and PIRT-based procedures provided equating results that were not practically different. Furthermore, the equating results produced by the two bifactor-based procedures became more accurate than the results returned by the 3PL model as tests became more multidimensional.
{"title":"Item Response Theory True Score Equating for the Bifactor Model Under the Common-Item Nonequivalent Groups Design.","authors":"Kyung Yong Kim","doi":"10.1177/01466216221108995","DOIUrl":"10.1177/01466216221108995","url":null,"abstract":"<p><p>Applying item response theory (IRT) true score equating to multidimensional IRT models is not straightforward due to the one-to-many relationship between a true score and latent variables. Under the common-item nonequivalent groups design, the purpose of the current study was to introduce two IRT true score equating procedures that adopted different dimension reduction strategies for the bifactor model. The first procedure, which was referred to as the integration procedure, linked the latent variable scales for the bifactor model and integrated out the specific factors from the item response function of the bifactor model. Then, IRT true score equating was applied to the marginalized bifactor model. The second procedure, which was referred to as the PIRT-based procedure, projected the specific dimensions onto the general dimension to obtain a locally dependent unidimensional IRT (UIRT) model and linked the scales of the UIRT model, followed by the application of IRT true score equating to the locally dependent UIRT model. Equating results obtained with the two equating procedures along with those obtained with the unidimensional three-parameter logistic (3PL) model were compared using both simulated and real data. In general, the integration and PIRT-based procedures provided equating results that were not practically different. Furthermore, the equating results produced by the two bifactor-based procedures became more accurate than the results returned by the 3PL model as tests became more multidimensional.</p>","PeriodicalId":48300,"journal":{"name":"Applied Psychological Measurement","volume":"46 6","pages":"479-493"},"PeriodicalIF":1.0,"publicationDate":"2022-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9382090/pdf/10.1177_01466216221108995.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10189451","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-07-01Epub Date: 2022-05-11DOI: 10.1177/01466216221084197
Youngsun Kim, Saebom Jeon, Chi Chang, Hwan Chung
Group similarities and differences may manifest themselves in a variety of ways in multiple-group latent class analysis (LCA). Sometimes, measurement models are identical across groups in LCA. In other situations, the measurement models may differ, suggesting that the latent structure itself is different between groups. Tests of measurement invariance shed light on this distinction. We created an R package glca that implements procedures for exploring differences in latent class structure between populations, taking multilevel data structure into account. The glca package deals with the fixed-effect LCA and the nonparametric random-effect LCA; the former can be applied in the situation where populations are segmented by the observed group variable itself, whereas the latter can be used when there are too many levels in the group variable to make a meaningful group comparisons by identifying a group-level latent variable. The glca package consists of functions for statistical test procedures for exploring group differences in various LCA models considering multilevel data structure.
{"title":"glca: An R Package for Multiple-Group Latent Class Analysis.","authors":"Youngsun Kim, Saebom Jeon, Chi Chang, Hwan Chung","doi":"10.1177/01466216221084197","DOIUrl":"10.1177/01466216221084197","url":null,"abstract":"<p><p>Group similarities and differences may manifest themselves in a variety of ways in multiple-group latent class analysis (LCA). Sometimes, measurement models are identical across groups in LCA. In other situations, the measurement models may differ, suggesting that the latent structure itself is different between groups. Tests of measurement invariance shed light on this distinction. We created an R package glca that implements procedures for exploring differences in latent class structure between populations, taking multilevel data structure into account. The glca package deals with the fixed-effect LCA and the nonparametric random-effect LCA; the former can be applied in the situation where populations are segmented by the observed group variable itself, whereas the latter can be used when there are too many levels in the group variable to make a meaningful group comparisons by identifying a group-level latent variable. The glca package consists of functions for statistical test procedures for exploring group differences in various LCA models considering multilevel data structure.</p>","PeriodicalId":48300,"journal":{"name":"Applied Psychological Measurement","volume":"46 5","pages":"439-441"},"PeriodicalIF":1.0,"publicationDate":"2022-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9265491/pdf/10.1177_01466216221084197.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10091269","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-07-01DOI: 10.1177/01466216221089344
Kaiwen Man, Jeffrey R Harring, Peida Zhan
Recently, joint models of item response data and response times have been proposed to better assess and understand test takers' learning processes. This article demonstrates how biometric information such as gaze fixation counts obtained from an eye-tracking machine can be integrated into the measurement model. The proposed joint modeling framework accommodates the relations among a test taker's latent ability, working speed and test engagement level via a person-side variance-covariance structure, while simultaneously permitting the modeling of item difficulty, time-intensity, and the engagement intensity through an item-side variance-covariance structure. A Bayesian estimation scheme is used to fit the proposed model to data. Posterior predictive model checking based on three discrepancy measures corresponding to various model components are introduced to assess model-data fit. Findings from a Monte Carlo simulation and results from analyzing experimental data demonstrate the utility of the model.
{"title":"Bridging Models of Biometric and Psychometric Assessment: A Three-Way Joint Modeling Approach of Item Responses, Response Times, and Gaze Fixation Counts.","authors":"Kaiwen Man, Jeffrey R Harring, Peida Zhan","doi":"10.1177/01466216221089344","DOIUrl":"https://doi.org/10.1177/01466216221089344","url":null,"abstract":"<p><p>Recently, joint models of item response data and response times have been proposed to better assess and understand test takers' learning processes. This article demonstrates how biometric information such as gaze fixation counts obtained from an eye-tracking machine can be integrated into the measurement model. The proposed joint modeling framework accommodates the relations among a test taker's latent ability, working speed and test engagement level via a person-side variance-covariance structure, while simultaneously permitting the modeling of item difficulty, time-intensity, and the engagement intensity through an item-side variance-covariance structure. A Bayesian estimation scheme is used to fit the proposed model to data. Posterior predictive model checking based on three discrepancy measures corresponding to various model components are introduced to assess model-data fit. Findings from a Monte Carlo simulation and results from analyzing experimental data demonstrate the utility of the model.</p>","PeriodicalId":48300,"journal":{"name":"Applied Psychological Measurement","volume":"46 5","pages":"361-381"},"PeriodicalIF":1.2,"publicationDate":"2022-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9265489/pdf/10.1177_01466216221089344.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10091266","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}