Pub Date : 2024-05-30DOI: 10.1177/00131644241255109
Sanaz Nazari, Walter L. Leite, A. Corinne Huggins-Manley
Social desirability bias (SDB) is a common threat to the validity of conclusions from responses to a scale or survey. There is a wide range of person-fit statistics in the literature that can be employed to detect SDB. In addition, machine learning classifiers, such as logistic regression and random forest, have the potential to distinguish between biased and unbiased responses. This study proposes a new application of these classifiers to detect SDB by considering several person-fit indices as features or predictors in the machine learning methods. The results of a Monte Carlo simulation study showed that for a single feature, applying person-fit indices directly and logistic regression led to similar classification results. However, the random forest classifier improved the classification of biased and unbiased responses substantially. Classification was improved in both logistic regression and random forest by considering multiple features simultaneously. Moreover, cross-validation indicated stable area under the curves (AUCs) across machine learning classifiers. A didactical illustration of applying random forest to detect SDB is presented.
{"title":"Enhancing the Detection of Social Desirability Bias Using Machine Learning: A Novel Application of Person-Fit Indices","authors":"Sanaz Nazari, Walter L. Leite, A. Corinne Huggins-Manley","doi":"10.1177/00131644241255109","DOIUrl":"https://doi.org/10.1177/00131644241255109","url":null,"abstract":"Social desirability bias (SDB) is a common threat to the validity of conclusions from responses to a scale or survey. There is a wide range of person-fit statistics in the literature that can be employed to detect SDB. In addition, machine learning classifiers, such as logistic regression and random forest, have the potential to distinguish between biased and unbiased responses. This study proposes a new application of these classifiers to detect SDB by considering several person-fit indices as features or predictors in the machine learning methods. The results of a Monte Carlo simulation study showed that for a single feature, applying person-fit indices directly and logistic regression led to similar classification results. However, the random forest classifier improved the classification of biased and unbiased responses substantially. Classification was improved in both logistic regression and random forest by considering multiple features simultaneously. Moreover, cross-validation indicated stable area under the curves (AUCs) across machine learning classifiers. A didactical illustration of applying random forest to detect SDB is presented.","PeriodicalId":11502,"journal":{"name":"Educational and Psychological Measurement","volume":"2018 1","pages":""},"PeriodicalIF":2.7,"publicationDate":"2024-05-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141188132","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-04-28DOI: 10.1177/00131644241246749
Joseph A. Rios, Jiayi Deng
To mitigate the potential damaging consequences of rapid guessing (RG), a form of noneffortful responding, researchers have proposed a number of scoring approaches. The present simulation study examines the robustness of the most popular of these approaches, the unidimensional effort-moderated (EM) scoring procedure, to multidimensional RG (i.e., RG that is linearly related to examinee ability). Specifically, EM scoring is compared with the Holman–Glas (HG) method, a multidimensional scoring approach, in terms of model fit distortion, ability parameter recovery, and omega reliability distortion. Test difficulty, the proportion of RG present within a sample, and the strength of association between ability and RG propensity were manipulated to create 80 total conditions. Overall, the results showed that EM scoring provided improved model fit compared with HG scoring when RG comprised 12% or less of all item responses. Furthermore, no significant differences in ability parameter recovery and omega reliability distortion were noted when comparing these two scoring approaches under moderate degrees of RG multidimensionality. These limited differences were largely due to the limited impact of RG on aggregated ability (bias ranged from 0.00 to 0.05 logits) and reliability (distortion was ≤ .005 units) estimates when as much as 40% of item responses in the sample data reflected RG behavior.
{"title":"Is Effort Moderated Scoring Robust to Multidimensional Rapid Guessing?","authors":"Joseph A. Rios, Jiayi Deng","doi":"10.1177/00131644241246749","DOIUrl":"https://doi.org/10.1177/00131644241246749","url":null,"abstract":"To mitigate the potential damaging consequences of rapid guessing (RG), a form of noneffortful responding, researchers have proposed a number of scoring approaches. The present simulation study examines the robustness of the most popular of these approaches, the unidimensional effort-moderated (EM) scoring procedure, to multidimensional RG (i.e., RG that is linearly related to examinee ability). Specifically, EM scoring is compared with the Holman–Glas (HG) method, a multidimensional scoring approach, in terms of model fit distortion, ability parameter recovery, and omega reliability distortion. Test difficulty, the proportion of RG present within a sample, and the strength of association between ability and RG propensity were manipulated to create 80 total conditions. Overall, the results showed that EM scoring provided improved model fit compared with HG scoring when RG comprised 12% or less of all item responses. Furthermore, no significant differences in ability parameter recovery and omega reliability distortion were noted when comparing these two scoring approaches under moderate degrees of RG multidimensionality. These limited differences were largely due to the limited impact of RG on aggregated ability (bias ranged from 0.00 to 0.05 logits) and reliability (distortion was ≤ .005 units) estimates when as much as 40% of item responses in the sample data reflected RG behavior.","PeriodicalId":11502,"journal":{"name":"Educational and Psychological Measurement","volume":"11 1","pages":""},"PeriodicalIF":2.7,"publicationDate":"2024-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140810537","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-04-17DOI: 10.1177/00131644241240435
Hyunjung Lee, Heining Cham
Determining the number of factors in exploratory factor analysis (EFA) is crucial because it affects the rest of the analysis and the conclusions of the study. Researchers have developed various methods for deciding the number of factors to retain in EFA, but this remains one of the most difficult decisions in the EFA. The purpose of this study is to compare the parallel analysis with the performance of fit indices that researchers have started using as another strategy for determining the optimal number of factors in EFA. The Monte Carlo simulation was conducted with ordered categorical items because there are mixed results in previous simulation studies, and ordered categorical items are common in behavioral science. The results of this study indicate that the parallel analysis and the root mean square error of approximation (RMSEA) performed well in most conditions, followed by the Tucker–Lewis index (TLI) and then by the comparative fit index (CFI). The robust corrections of CFI, TLI, and RMSEA performed better in detecting misfit underfactored models than the original fit indices. However, they did not produce satisfactory results in dichotomous data with a small sample size. Implications, limitations of this study, and future research directions are discussed.
{"title":"Comparing Accuracy of Parallel Analysis and Fit Statistics for Estimating the Number of Factors With Ordered Categorical Data in Exploratory Factor Analysis","authors":"Hyunjung Lee, Heining Cham","doi":"10.1177/00131644241240435","DOIUrl":"https://doi.org/10.1177/00131644241240435","url":null,"abstract":"Determining the number of factors in exploratory factor analysis (EFA) is crucial because it affects the rest of the analysis and the conclusions of the study. Researchers have developed various methods for deciding the number of factors to retain in EFA, but this remains one of the most difficult decisions in the EFA. The purpose of this study is to compare the parallel analysis with the performance of fit indices that researchers have started using as another strategy for determining the optimal number of factors in EFA. The Monte Carlo simulation was conducted with ordered categorical items because there are mixed results in previous simulation studies, and ordered categorical items are common in behavioral science. The results of this study indicate that the parallel analysis and the root mean square error of approximation (RMSEA) performed well in most conditions, followed by the Tucker–Lewis index (TLI) and then by the comparative fit index (CFI). The robust corrections of CFI, TLI, and RMSEA performed better in detecting misfit underfactored models than the original fit indices. However, they did not produce satisfactory results in dichotomous data with a small sample size. Implications, limitations of this study, and future research directions are discussed.","PeriodicalId":11502,"journal":{"name":"Educational and Psychological Measurement","volume":"1 1","pages":""},"PeriodicalIF":2.7,"publicationDate":"2024-04-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140616292","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-04-17DOI: 10.1177/00131644241242789
Hung-Yu Huang
The use of discrete categorical formats to assess psychological traits has a long-standing tradition that is deeply embedded in item response theory models. The increasing prevalence and endorsement of computer- or web-based testing has led to greater focus on continuous response formats, which offer numerous advantages in both respondent experience and methodological considerations. Response styles, which are frequently observed in self-reported data, reflect a propensity to answer questionnaire items in a consistent manner, regardless of the item content. These response styles have been identified as causes of skewed scale scores and biased trait inferences. In this study, we investigate the impact of response styles on individuals’ responses within a continuous scale context, with a specific emphasis on extreme response style (ERS) and acquiescence response style (ARS). Building upon the established continuous response model (CRM), we propose extensions known as the CRM-ERS and CRM-ARS. These extensions are employed to quantitatively capture individual variations in these distinct response styles. The effectiveness of the proposed models was evaluated through a series of simulation studies. Bayesian methods were employed to effectively calibrate the model parameters. The results demonstrate that both models achieve satisfactory parameter recovery. Neglecting the effects of response styles led to biased estimation, underscoring the importance of accounting for these effects. Moreover, the estimation accuracy improved with increasing test length and sample size. An empirical analysis is presented to elucidate the practical applications and implications of the proposed models.
{"title":"Exploring the Influence of Response Styles on Continuous Scale Assessments: Insights From a Novel Modeling Approach","authors":"Hung-Yu Huang","doi":"10.1177/00131644241242789","DOIUrl":"https://doi.org/10.1177/00131644241242789","url":null,"abstract":"The use of discrete categorical formats to assess psychological traits has a long-standing tradition that is deeply embedded in item response theory models. The increasing prevalence and endorsement of computer- or web-based testing has led to greater focus on continuous response formats, which offer numerous advantages in both respondent experience and methodological considerations. Response styles, which are frequently observed in self-reported data, reflect a propensity to answer questionnaire items in a consistent manner, regardless of the item content. These response styles have been identified as causes of skewed scale scores and biased trait inferences. In this study, we investigate the impact of response styles on individuals’ responses within a continuous scale context, with a specific emphasis on extreme response style (ERS) and acquiescence response style (ARS). Building upon the established continuous response model (CRM), we propose extensions known as the CRM-ERS and CRM-ARS. These extensions are employed to quantitatively capture individual variations in these distinct response styles. The effectiveness of the proposed models was evaluated through a series of simulation studies. Bayesian methods were employed to effectively calibrate the model parameters. The results demonstrate that both models achieve satisfactory parameter recovery. Neglecting the effects of response styles led to biased estimation, underscoring the importance of accounting for these effects. Moreover, the estimation accuracy improved with increasing test length and sample size. An empirical analysis is presented to elucidate the practical applications and implications of the proposed models.","PeriodicalId":11502,"journal":{"name":"Educational and Psychological Measurement","volume":"35 1","pages":""},"PeriodicalIF":2.7,"publicationDate":"2024-04-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140617770","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-04-13DOI: 10.1177/00131644241242806
Kuan-Yu Jin, Thomas Eckes
Insufficient effort responding (IER) refers to a lack of effort when answering survey or questionnaire items. Such items typically offer more than two ordered response categories, with Likert-type scales as the most prominent example. The underlying assumption is that the successive categories reflect increasing levels of the latent variable assessed. This research investigates how IER affects the intended category order of Likert-type scales, focusing on the category thresholds in the polytomous Rasch model. In a simulation study, we examined several IER patterns in datasets generated from the mixture model for IER (MMIER). The key findings were (a) random responding and overusing the non-extreme categories of a five-category scale were each associated with high frequencies of disordered category thresholds; (b) raising the IER rate from 5% to 10% led to a substantial increase in threshold disordering, particularly among easy and difficult items; (c) narrow distances between adjacent categories (0.5 logits) were associated with more frequent disordering, compared with wide distances (1.0 logits). Two real-data examples highlighted the efficiency and utility of the MMIER for detecting latent classes of respondents exhibiting different forms of IER. Under the MMIER, the frequency of disordered thresholds was reduced substantially in both examples. The discussion focuses on the practical implications of using the MMIER in survey research and points to directions for future research.
{"title":"The Impact of Insufficient Effort Responses on the Order of Category Thresholds in the Polytomous Rasch Model","authors":"Kuan-Yu Jin, Thomas Eckes","doi":"10.1177/00131644241242806","DOIUrl":"https://doi.org/10.1177/00131644241242806","url":null,"abstract":"Insufficient effort responding (IER) refers to a lack of effort when answering survey or questionnaire items. Such items typically offer more than two ordered response categories, with Likert-type scales as the most prominent example. The underlying assumption is that the successive categories reflect increasing levels of the latent variable assessed. This research investigates how IER affects the intended category order of Likert-type scales, focusing on the category thresholds in the polytomous Rasch model. In a simulation study, we examined several IER patterns in datasets generated from the mixture model for IER (MMIER). The key findings were (a) random responding and overusing the non-extreme categories of a five-category scale were each associated with high frequencies of disordered category thresholds; (b) raising the IER rate from 5% to 10% led to a substantial increase in threshold disordering, particularly among easy and difficult items; (c) narrow distances between adjacent categories (0.5 logits) were associated with more frequent disordering, compared with wide distances (1.0 logits). Two real-data examples highlighted the efficiency and utility of the MMIER for detecting latent classes of respondents exhibiting different forms of IER. Under the MMIER, the frequency of disordered thresholds was reduced substantially in both examples. The discussion focuses on the practical implications of using the MMIER in survey research and points to directions for future research.","PeriodicalId":11502,"journal":{"name":"Educational and Psychological Measurement","volume":"116 1","pages":""},"PeriodicalIF":2.7,"publicationDate":"2024-04-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140581547","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-04-01DOI: 10.1177/00131644241237502
Franz Classe, Christoph Kern
We develop a latent variable forest (LV Forest) algorithm for the estimation of latent variable scores with one or more latent variables. LV Forest estimates unbiased latent variable scores based on confirmatory factor analysis (CFA) models with ordinal and/or numerical response variables. Through parametric model restrictions paired with a nonparametric tree-based machine learning approach, LV Forest estimates latent variable scores using models that are unbiased with respect to relevant subgroups in the population. This way, estimated latent variable scores are interpretable with respect to systematic influences of covariates without being biased by these variables. By building a tree ensemble, LV Forest takes parameter heterogeneity in latent variable modeling into account to capture subgroups with both good model fit and stable parameter estimates. We apply LV Forest to simulated data with heterogeneous model parameters as well as to real large-scale survey data. We show that LV Forest improves the accuracy of score estimation if parameter heterogeneity is present.
{"title":"Latent Variable Forests for Latent Variable Score Estimation","authors":"Franz Classe, Christoph Kern","doi":"10.1177/00131644241237502","DOIUrl":"https://doi.org/10.1177/00131644241237502","url":null,"abstract":"We develop a latent variable forest (LV Forest) algorithm for the estimation of latent variable scores with one or more latent variables. LV Forest estimates unbiased latent variable scores based on confirmatory factor analysis (CFA) models with ordinal and/or numerical response variables. Through parametric model restrictions paired with a nonparametric tree-based machine learning approach, LV Forest estimates latent variable scores using models that are unbiased with respect to relevant subgroups in the population. This way, estimated latent variable scores are interpretable with respect to systematic influences of covariates without being biased by these variables. By building a tree ensemble, LV Forest takes parameter heterogeneity in latent variable modeling into account to capture subgroups with both good model fit and stable parameter estimates. We apply LV Forest to simulated data with heterogeneous model parameters as well as to real large-scale survey data. We show that LV Forest improves the accuracy of score estimation if parameter heterogeneity is present.","PeriodicalId":11502,"journal":{"name":"Educational and Psychological Measurement","volume":"20 1","pages":""},"PeriodicalIF":2.7,"publicationDate":"2024-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140581666","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-03-28DOI: 10.1177/00131644241235333
Lawrence T. DeCarlo
A psychological framework for different types of items commonly used with mixed-format exams is proposed. A choice model based on signal detection theory (SDT) is used for multiple-choice (MC) items, whereas an item response theory (IRT) model is used for open-ended (OE) items. The SDT and IRT models are shown to share a common conceptualization in terms of latent states of “know/don’t know” at the examinee level. This in turn suggests a way to join or “fuse” the models—through the probability of knowing. A general model that fuses the SDT choice model, for MC items, with a generalized sequential logit model, for OE items, is introduced. Fitting SDT and IRT models simultaneously allows one to examine possible differences in psychological processes across the different types of items, to examine the effects of covariates in both models simultaneously, to allow for relations among the model parameters, and likely offers potential estimation benefits. The utility of the approach is illustrated with MC and OE items from large-scale international exams.
{"title":"Fused SDT/IRT Models for Mixed-Format Exams","authors":"Lawrence T. DeCarlo","doi":"10.1177/00131644241235333","DOIUrl":"https://doi.org/10.1177/00131644241235333","url":null,"abstract":"A psychological framework for different types of items commonly used with mixed-format exams is proposed. A choice model based on signal detection theory (SDT) is used for multiple-choice (MC) items, whereas an item response theory (IRT) model is used for open-ended (OE) items. The SDT and IRT models are shown to share a common conceptualization in terms of latent states of “know/don’t know” at the examinee level. This in turn suggests a way to join or “fuse” the models—through the probability of knowing. A general model that fuses the SDT choice model, for MC items, with a generalized sequential logit model, for OE items, is introduced. Fitting SDT and IRT models simultaneously allows one to examine possible differences in psychological processes across the different types of items, to examine the effects of covariates in both models simultaneously, to allow for relations among the model parameters, and likely offers potential estimation benefits. The utility of the approach is illustrated with MC and OE items from large-scale international exams.","PeriodicalId":11502,"journal":{"name":"Educational and Psychological Measurement","volume":"40 1","pages":""},"PeriodicalIF":2.7,"publicationDate":"2024-03-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140322190","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-02-21DOI: 10.1177/00131644241228602
Tenko Raykov, Ahmed Haddadi, Christine DiStefano, Mohammed Alqabbaa
This note is concerned with the study of temporal development in several indices reflecting clustering effects in multilevel designs that are frequently utilized in educational and behavioral research. A latent variable method-based approach is outlined, which can be used to point and interval estimate the growth or decline in important functions of level-specific variances in two-level and three-level settings. The procedure may also be employed for the purpose of examining stability over time in clustering effects. The method can be utilized with widely circulated latent variable modeling software, and is illustrated using empirical examples.
{"title":"Examining the Dynamic of Clustering Effects in Multilevel Designs: A Latent Variable Method Application","authors":"Tenko Raykov, Ahmed Haddadi, Christine DiStefano, Mohammed Alqabbaa","doi":"10.1177/00131644241228602","DOIUrl":"https://doi.org/10.1177/00131644241228602","url":null,"abstract":"This note is concerned with the study of temporal development in several indices reflecting clustering effects in multilevel designs that are frequently utilized in educational and behavioral research. A latent variable method-based approach is outlined, which can be used to point and interval estimate the growth or decline in important functions of level-specific variances in two-level and three-level settings. The procedure may also be employed for the purpose of examining stability over time in clustering effects. The method can be utilized with widely circulated latent variable modeling software, and is illustrated using empirical examples.","PeriodicalId":11502,"journal":{"name":"Educational and Psychological Measurement","volume":"14 1","pages":""},"PeriodicalIF":2.7,"publicationDate":"2024-02-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139954004","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-02-01Epub Date: 2023-02-17DOI: 10.1177/00131644231155838
Martijn Schoenmakers, Jesper Tijmstra, Jeroen Vermunt, Maria Bolsinova
Extreme response style (ERS), the tendency of participants to select extreme item categories regardless of the item content, has frequently been found to decrease the validity of Likert-type questionnaire results. For this reason, various item response theory (IRT) models have been proposed to model ERS and correct for it. Comparisons of these models are however rare in the literature, especially in the context of cross-cultural comparisons, where ERS is even more relevant due to cultural differences between groups. To remedy this issue, the current article examines two frequently used IRT models that can be estimated using standard software: a multidimensional nominal response model (MNRM) and a IRTree model. Studying conceptual differences between these models reveals that they differ substantially in their conceptualization of ERS. These differences result in different category probabilities between the models. To evaluate the impact of these differences in a multigroup context, a simulation study is conducted. Our results show that when the groups differ in their average ERS, the IRTree model and MNRM can drastically differ in their conclusions about the size and presence of differences in the substantive trait between these groups. An empirical example is given and implications for the future use of both models and the conceptualization of ERS are discussed.
{"title":"Correcting for Extreme Response Style: Model Choice Matters.","authors":"Martijn Schoenmakers, Jesper Tijmstra, Jeroen Vermunt, Maria Bolsinova","doi":"10.1177/00131644231155838","DOIUrl":"10.1177/00131644231155838","url":null,"abstract":"<p><p>Extreme response style (ERS), the tendency of participants to select extreme item categories regardless of the item content, has frequently been found to decrease the validity of Likert-type questionnaire results. For this reason, various item response theory (IRT) models have been proposed to model ERS and correct for it. Comparisons of these models are however rare in the literature, especially in the context of cross-cultural comparisons, where ERS is even more relevant due to cultural differences between groups. To remedy this issue, the current article examines two frequently used IRT models that can be estimated using standard software: a multidimensional nominal response model (MNRM) and a IRTree model. Studying conceptual differences between these models reveals that they differ substantially in their conceptualization of ERS. These differences result in different category probabilities between the models. To evaluate the impact of these differences in a multigroup context, a simulation study is conducted. Our results show that when the groups differ in their average ERS, the IRTree model and MNRM can drastically differ in their conclusions about the size and presence of differences in the substantive trait between these groups. An empirical example is given and implications for the future use of both models and the conceptualization of ERS are discussed.</p>","PeriodicalId":11502,"journal":{"name":"Educational and Psychological Measurement","volume":"1 1","pages":"145-170"},"PeriodicalIF":2.7,"publicationDate":"2024-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10795569/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41386423","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-01-05DOI: 10.1177/00131644231222603
M. Xu, Jessica A. R. Logan
Research designs that include planned missing data are gaining popularity in applied education research. These methods have traditionally relied on introducing missingness into data collections using the missing completely at random (MCAR) mechanism. This study assesses whether planned missingness can also be implemented when data are instead designed to be purposefully missing based on student performance. A research design with purposefully selected missingness would allow researchers to focus all assessment efforts on a target sample, while still maintaining the statistical power of the full sample. This study introduces the method and demonstrates the performance of the purposeful missingness method within the two-method measurement planned missingness design using a Monte Carlo simulation study. Results demonstrate that the purposeful missingness method can recover parameter estimates in models with as much accuracy as the MCAR method, across multiple conditions.
{"title":"Two-Method Measurement Planned Missing Data With Purposefully Selected Samples","authors":"M. Xu, Jessica A. R. Logan","doi":"10.1177/00131644231222603","DOIUrl":"https://doi.org/10.1177/00131644231222603","url":null,"abstract":"Research designs that include planned missing data are gaining popularity in applied education research. These methods have traditionally relied on introducing missingness into data collections using the missing completely at random (MCAR) mechanism. This study assesses whether planned missingness can also be implemented when data are instead designed to be purposefully missing based on student performance. A research design with purposefully selected missingness would allow researchers to focus all assessment efforts on a target sample, while still maintaining the statistical power of the full sample. This study introduces the method and demonstrates the performance of the purposeful missingness method within the two-method measurement planned missingness design using a Monte Carlo simulation study. Results demonstrate that the purposeful missingness method can recover parameter estimates in models with as much accuracy as the MCAR method, across multiple conditions.","PeriodicalId":11502,"journal":{"name":"Educational and Psychological Measurement","volume":"29 47","pages":""},"PeriodicalIF":2.7,"publicationDate":"2024-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139382541","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}