Shaoyang Guo, Yanlei Chen, Chanjin Zheng, Guiyu Li
Several recent works have tackled the estimation issue for the unidimensional four-parameter logistic model (4PLM). Despite these efforts, the issue remains a challenge for the multidimensional 4PLM (M4PLM). Fu et al. (2021) proposed a Gibbs sampler for the M4PLM, but it is time-consuming. In this paper, a mixture-modelling-based Bayesian MH-RM (MM-MH-RM) algorithm is proposed for the M4PLM to obtain the maximum a posteriori (MAP) estimates. In a comparison of the MM-MH-RM algorithm to the original MH-RM algorithm, two simulation studies and an empirical example demonstrated that the MM-MH-RM algorithm possessed the benefits of the mixture-modelling approach and could produce more robust estimates with guaranteed convergence rates and fast computation. The MATLAB codes for the MM-MH-RM algorithm are available in the online appendix.
{"title":"Mixture-modelling-based Bayesian MH-RM algorithm for the multidimensional 4PLM","authors":"Shaoyang Guo, Yanlei Chen, Chanjin Zheng, Guiyu Li","doi":"10.1111/bmsp.12300","DOIUrl":"10.1111/bmsp.12300","url":null,"abstract":"<p>Several recent works have tackled the estimation issue for the unidimensional four-parameter logistic model (4PLM). Despite these efforts, the issue remains a challenge for the multidimensional 4PLM (M4PLM). Fu et al. (2021) proposed a Gibbs sampler for the M4PLM, but it is time-consuming. In this paper, a mixture-modelling-based Bayesian MH-RM (MM-MH-RM) algorithm is proposed for the M4PLM to obtain the maximum a posteriori (MAP) estimates. In a comparison of the MM-MH-RM algorithm to the original MH-RM algorithm, two simulation studies and an empirical example demonstrated that the MM-MH-RM algorithm possessed the benefits of the mixture-modelling approach and could produce more robust estimates with guaranteed convergence rates and fast computation. The MATLAB codes for the MM-MH-RM algorithm are available in the online appendix.</p>","PeriodicalId":55322,"journal":{"name":"British Journal of Mathematical & Statistical Psychology","volume":null,"pages":null},"PeriodicalIF":2.6,"publicationDate":"2023-02-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10643462","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pairwise maximum likelihood (PML) estimation is a promising method for multilevel models with discrete responses. Multilevel models take into account that units within a cluster tend to be more alike than units from different clusters. The pairwise likelihood is then obtained as the product of bivariate likelihoods for all within-cluster pairs of units and items. In this study, we investigate the PML estimation method with computationally intensive multilevel random intercept and random slope structural equation models (SEM) in discrete data. In pursuing this, we first reconsidered the general ‘wide format’ (WF) approach for SEM models and then extend the WF approach with random slopes. In a small simulation study we the determine accuracy and efficiency of the PML estimation method by varying the sample size (250, 500, 1000, 2000), response scales (two-point, four-point), and data-generating model (mediation model with three random slopes, factor model with one and two random slopes). Overall, results show that the PML estimation method is capable of estimating computationally intensive random intercept and random slopes multilevel models in the SEM framework with discrete data and many (six or more) latent variables with satisfactory accuracy and efficiency. However, the condition with 250 clusters combined with a two-point response scale shows more bias.
{"title":"Multilevel SEM with random slopes in discrete data using the pairwise maximum likelihood","authors":"Maria T. Barendse, Yves Rosseel","doi":"10.1111/bmsp.12294","DOIUrl":"10.1111/bmsp.12294","url":null,"abstract":"<p>Pairwise maximum likelihood (PML) estimation is a promising method for multilevel models with discrete responses. Multilevel models take into account that units within a cluster tend to be more alike than units from different clusters. The pairwise likelihood is then obtained as the product of bivariate likelihoods for all within-cluster pairs of units and items. In this study, we investigate the PML estimation method with computationally intensive multilevel random intercept and random slope structural equation models (SEM) in discrete data. In pursuing this, we first reconsidered the general ‘wide format’ (WF) approach for SEM models and then extend the WF approach with random slopes. In a small simulation study we the determine accuracy and efficiency of the PML estimation method by varying the sample size (250, 500, 1000, 2000), response scales (two-point, four-point), and data-generating model (mediation model with three random slopes, factor model with one and two random slopes). Overall, results show that the PML estimation method is capable of estimating computationally intensive random intercept and random slopes multilevel models in the SEM framework with discrete data and many (six or more) latent variables with satisfactory accuracy and efficiency. However, the condition with 250 clusters combined with a two-point response scale shows more bias.</p>","PeriodicalId":55322,"journal":{"name":"British Journal of Mathematical & Statistical Psychology","volume":null,"pages":null},"PeriodicalIF":2.6,"publicationDate":"2023-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/bmsp.12294","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9308773","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ordinal data occur frequently in the social sciences. When applying principal component analysis (PCA), however, those data are often treated as numeric, implying linear relationships between the variables at hand; alternatively, non-linear PCA is applied where the obtained quantifications are sometimes hard to interpret. Non-linear PCA for categorical data, also called optimal scoring/scaling, constructs new variables by assigning numerical values to categories such that the proportion of variance in those new variables that is explained by a predefined number of principal components (PCs) is maximized. We propose a penalized version of non-linear PCA for ordinal variables that is a smoothed intermediate between standard PCA on category labels and non-linear PCA as used so far. The new approach is by no means limited to monotonic effects and offers both better interpretability of the non-linear transformation of the category labels and better performance on validation data than unpenalized non-linear PCA and/or standard linear PCA. In particular, an application of penalized optimal scaling to ordinal data as given with the International Classification of Functioning, Disability and Health (ICF) is provided.
{"title":"Penalized optimal scaling for ordinal variables with an application to international classification of functioning core sets","authors":"Aisouda Hoshiyar, Henk A. L. Kiers, Jan Gertheiss","doi":"10.1111/bmsp.12297","DOIUrl":"10.1111/bmsp.12297","url":null,"abstract":"<p>Ordinal data occur frequently in the social sciences. When applying principal component analysis (PCA), however, those data are often treated as numeric, implying linear relationships between the variables at hand; alternatively, non-linear PCA is applied where the obtained quantifications are sometimes hard to interpret. Non-linear PCA for categorical data, also called optimal scoring/scaling, constructs new variables by assigning numerical values to categories such that the proportion of variance in those new variables that is explained by a predefined number of principal components (PCs) is maximized. We propose a penalized version of non-linear PCA for ordinal variables that is a smoothed intermediate between standard PCA on category labels and non-linear PCA as used so far. The new approach is by no means limited to monotonic effects and offers both better interpretability of the non-linear transformation of the category labels and better performance on validation data than unpenalized non-linear PCA and/or standard linear PCA. In particular, an application of penalized optimal scaling to ordinal data as given with the International Classification of Functioning, Disability and Health (ICF) is provided.</p>","PeriodicalId":55322,"journal":{"name":"British Journal of Mathematical & Statistical Psychology","volume":null,"pages":null},"PeriodicalIF":2.6,"publicationDate":"2023-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/bmsp.12297","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9254457","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Diagnostic models provide a statistical framework for designing formative assessments by classifying student knowledge profiles according to a collection of fine-grained attributes. The context and ecosystem in which students learn may play an important role in skill mastery, and it is therefore important to develop methods for incorporating student covariates into diagnostic models. Including covariates may provide researchers and practitioners with the ability to evaluate novel interventions or understand the role of background knowledge in attribute mastery. Existing research is designed to include covariates in confirmatory diagnostic models, which are also known as restricted latent class models. We propose new methods for including covariates in exploratory RLCMs that jointly infer the latent structure and evaluate the role of covariates on performance and skill mastery. We present a novel Bayesian formulation and report a Markov chain Monte Carlo algorithm using a Metropolis-within-Gibbs algorithm for approximating the model parameter posterior distribution. We report Monte Carlo simulation evidence regarding the accuracy of our new methods and present results from an application that examines the role of student background knowledge on the mastery of a probability data set.
{"title":"Extending exploratory diagnostic classification models: Inferring the effect of covariates","authors":"Hulya Duygu Yigit, Steven Andrew Culpepper","doi":"10.1111/bmsp.12298","DOIUrl":"10.1111/bmsp.12298","url":null,"abstract":"<p>Diagnostic models provide a statistical framework for designing formative assessments by classifying student knowledge profiles according to a collection of fine-grained attributes. The context and ecosystem in which students learn may play an important role in skill mastery, and it is therefore important to develop methods for incorporating student covariates into diagnostic models. Including covariates may provide researchers and practitioners with the ability to evaluate novel interventions or understand the role of background knowledge in attribute mastery. Existing research is designed to include covariates in confirmatory diagnostic models, which are also known as restricted latent class models. We propose new methods for including covariates in exploratory RLCMs that jointly infer the latent structure and evaluate the role of covariates on performance and skill mastery. We present a novel Bayesian formulation and report a Markov chain Monte Carlo algorithm using a Metropolis-within-Gibbs algorithm for approximating the model parameter posterior distribution. We report Monte Carlo simulation evidence regarding the accuracy of our new methods and present results from an application that examines the role of student background knowledge on the mastery of a probability data set.</p>","PeriodicalId":55322,"journal":{"name":"British Journal of Mathematical & Statistical Psychology","volume":null,"pages":null},"PeriodicalIF":2.6,"publicationDate":"2023-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/bmsp.12298","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9609479","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Larry V. Hedges, Elizabeth Tipton, Rrita Zejnullahi, Karina G. Diaz
It is common practice in both randomized and quasi-experiments to adjust for baseline characteristics when estimating the average effect of an intervention. The inclusion of a pre-test, for example, can reduce both the standard error of this estimate and—in non-randomized designs—its bias. At the same time, it is also standard to report the effect of an intervention in standardized effect size units, thereby making it comparable to other interventions and studies. Curiously, the estimation of this effect size, including covariate adjustment, has received little attention. In this article, we provide a framework for defining effect sizes in designs with a pre-test (e.g., difference-in-differences and analysis of covariance) and propose estimators of those effect sizes. The estimators and approximations to their sampling distributions are evaluated using a simulation study and then demonstrated using an example from published data.
{"title":"Effect sizes in ANCOVA and difference-in-differences designs","authors":"Larry V. Hedges, Elizabeth Tipton, Rrita Zejnullahi, Karina G. Diaz","doi":"10.1111/bmsp.12296","DOIUrl":"10.1111/bmsp.12296","url":null,"abstract":"<p>It is common practice in both randomized and quasi-experiments to adjust for baseline characteristics when estimating the average effect of an intervention. The inclusion of a pre-test, for example, can reduce both the standard error of this estimate and—in non-randomized designs—its bias. At the same time, it is also standard to report the effect of an intervention in standardized effect size units, thereby making it comparable to other interventions and studies. Curiously, the estimation of this effect size, including covariate adjustment, has received little attention. In this article, we provide a framework for defining effect sizes in designs with a pre-test (e.g., difference-in-differences and analysis of covariance) and propose estimators of those effect sizes. The estimators and approximations to their sampling distributions are evaluated using a simulation study and then demonstrated using an example from published data.</p>","PeriodicalId":55322,"journal":{"name":"British Journal of Mathematical & Statistical Psychology","volume":null,"pages":null},"PeriodicalIF":2.6,"publicationDate":"2023-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9254019","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Observational data typically contain measurement errors. Covariance-based structural equation modelling (CB-SEM) is capable of modelling measurement errors and yields consistent parameter estimates. In contrast, methods of regression analysis using weighted composites as well as a partial least squares approach to SEM facilitate the prediction and diagnosis of individuals/participants. But regression analysis with weighted composites has been known to yield attenuated regression coefficients when predictors contain errors. Contrary to the common belief that CB-SEM is the preferred method for the analysis of observational data, this article shows that regression analysis via weighted composites yields parameter estimates with much smaller standard errors, and thus corresponds to greater values of the signal-to-noise ratio (SNR). In particular, the SNR for the regression coefficient via the least squares (LS) method with equally weighted composites is mathematically greater than that by CB-SEM if the items for each factor are parallel, even when the SEM model is correctly specified and estimated by an efficient method. Analytical, numerical and empirical results also show that LS regression using weighted composites performs as well as or better than the normal maximum likelihood method for CB-SEM under many conditions even when the population distribution is multivariate normal. Results also show that the LS regression coefficients become more efficient when considering the sampling errors in the weights of composites than those that are conditional on weights.
{"title":"Which method delivers greater signal-to-noise ratio: Structural equation modelling or regression analysis with weighted composites?","authors":"Ke-Hai Yuan, Yongfei Fang","doi":"10.1111/bmsp.12293","DOIUrl":"10.1111/bmsp.12293","url":null,"abstract":"<p>Observational data typically contain measurement errors. Covariance-based structural equation modelling (CB-SEM) is capable of modelling measurement errors and yields consistent parameter estimates. In contrast, methods of regression analysis using weighted composites as well as a partial least squares approach to SEM facilitate the prediction and diagnosis of individuals/participants. But regression analysis with weighted composites has been known to yield attenuated regression coefficients when predictors contain errors. Contrary to the common belief that CB-SEM is the preferred method for the analysis of observational data, this article shows that regression analysis via weighted composites yields parameter estimates with much smaller standard errors, and thus corresponds to greater values of the signal-to-noise ratio (SNR). In particular, the SNR for the regression coefficient via the least squares (LS) method with equally weighted composites is mathematically greater than that by CB-SEM if the items for each factor are parallel, even when the SEM model is correctly specified and estimated by an efficient method. Analytical, numerical and empirical results also show that LS regression using weighted composites performs as well as or better than the normal maximum likelihood method for CB-SEM under many conditions even when the population distribution is multivariate normal. Results also show that the LS regression coefficients become more efficient when considering the sampling errors in the weights of composites than those that are conditional on weights.</p>","PeriodicalId":55322,"journal":{"name":"British Journal of Mathematical & Statistical Psychology","volume":null,"pages":null},"PeriodicalIF":2.6,"publicationDate":"2022-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41180529","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Recent literature has pointed out that the basic local independence model (BLIM) when applied to some specific instances of knowledge structures presents identifiability issues. Furthermore, it has been shown that for such instances the model presents a stronger form of unidentifiability named empirical indistinguishability, which leads to the fact that the existence of certain knowledge states in such structures cannot be empirically tested. In this article the notion of indistinguishability is extended to skill maps and, more generally, to the competence-based knowledge space theory. Theoretical results are provided showing that skill maps can be empirically indistinguishable from one another. The most relevant consequence of this is that for some skills there is no empirical evidence to establish their existence. This result is strictly related to the type of probabilistic model investigated, which is essentially the BLIM. Alternative models may exist or can be developed in knowledge space theory for which this indistinguishability problem disappears.
{"title":"Empirical indistinguishability: From the knowledge structure to the skills","authors":"Andrea Spoto, Luca Stefanutti","doi":"10.1111/bmsp.12291","DOIUrl":"10.1111/bmsp.12291","url":null,"abstract":"<p>Recent literature has pointed out that the basic local independence model (BLIM) when applied to some specific instances of knowledge structures presents identifiability issues. Furthermore, it has been shown that for such instances the model presents a stronger form of unidentifiability named empirical indistinguishability, which leads to the fact that the existence of certain knowledge states in such structures cannot be empirically tested. In this article the notion of indistinguishability is extended to skill maps and, more generally, to the competence-based knowledge space theory. Theoretical results are provided showing that skill maps can be empirically indistinguishable from one another. The most relevant consequence of this is that for some skills there is no empirical evidence to establish their existence. This result is strictly related to the type of probabilistic model investigated, which is essentially the BLIM. Alternative models may exist or can be developed in knowledge space theory for which this indistinguishability problem disappears.</p>","PeriodicalId":55322,"journal":{"name":"British Journal of Mathematical & Statistical Psychology","volume":null,"pages":null},"PeriodicalIF":2.6,"publicationDate":"2022-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/bmsp.12291","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9254578","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Models for rankings have been shown to produce more efficient estimators than comparable models for first/top choices. The discussions and applications of these models typically only consider unordered alternatives. But these models can be usefully adapted to the case where a respondent ranks a set of ordered alternatives that are ordered response categories. This paper proposes eliciting a rank order that is consistent with the ordering of the response categories, and then modelling the observed rankings using a variant of the rank ordered logit model where the distribution of rankings has been truncated to the set of admissible rankings. This results in lower standard errors in comparison to when only a single top category is selected by the respondents. And the restrictions on the set of admissible rankings reduces the number of decisions needed to be made by respondents in comparison to ranking a set of unordered alternatives. Simulation studies and application examples featuring models based on a stereotype regression model and a rating scale item response model are provided to demonstrate the utility of this approach.
{"title":"A note on the use of rank-ordered logit models for ordered response categories","authors":"Timothy R. Johnson","doi":"10.1111/bmsp.12292","DOIUrl":"10.1111/bmsp.12292","url":null,"abstract":"<p>Models for rankings have been shown to produce more efficient estimators than comparable models for first/top choices. The discussions and applications of these models typically only consider unordered alternatives. But these models can be usefully adapted to the case where a respondent ranks a set of ordered alternatives that are ordered response categories. This paper proposes eliciting a rank order that is consistent with the ordering of the response categories, and then modelling the observed rankings using a variant of the rank ordered logit model where the distribution of rankings has been truncated to the set of admissible rankings. This results in lower standard errors in comparison to when only a single top category is selected by the respondents. And the restrictions on the set of admissible rankings reduces the number of decisions needed to be made by respondents in comparison to ranking a set of unordered alternatives. Simulation studies and application examples featuring models based on a stereotype regression model and a rating scale item response model are provided to demonstrate the utility of this approach.</p>","PeriodicalId":55322,"journal":{"name":"British Journal of Mathematical & Statistical Psychology","volume":null,"pages":null},"PeriodicalIF":2.6,"publicationDate":"2022-11-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10510690","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Response process data collected from human–computer interactive items contain detailed information about respondents' behavioural patterns and cognitive processes. Such data are valuable sources for analysing respondents' problem-solving strategies. However, the irregular data format and the complex structure make standard statistical tools difficult to apply. This article develops a computationally efficient method for exploratory analysis of such process data. The new approach segments a lengthy individual process into a sequence of short subprocesses to achieve complexity reduction, easy clustering and meaningful interpretation. Each subprocess is considered a subtask. The segmentation is based on sequential action predictability using a parsimonious predictive model combined with the Shannon entropy. Simulation studies are conducted to assess the performance of the new method. We use a case study of PIAAC 2012 to demonstrate how exploratory analysis for process data can be carried out with the new approach.
{"title":"Subtask analysis of process data through a predictive model","authors":"Zhi Wang, Xueying Tang, Jingchen Liu, Zhiliang Ying","doi":"10.1111/bmsp.12290","DOIUrl":"10.1111/bmsp.12290","url":null,"abstract":"<p>Response process data collected from human–computer interactive items contain detailed information about respondents' behavioural patterns and cognitive processes. Such data are valuable sources for analysing respondents' problem-solving strategies. However, the irregular data format and the complex structure make standard statistical tools difficult to apply. This article develops a computationally efficient method for exploratory analysis of such process data. The new approach segments a lengthy individual process into a sequence of short subprocesses to achieve complexity reduction, easy clustering and meaningful interpretation. Each subprocess is considered a subtask. The segmentation is based on sequential action predictability using a parsimonious predictive model combined with the Shannon entropy. Simulation studies are conducted to assess the performance of the new method. We use a case study of PIAAC 2012 to demonstrate how exploratory analysis for process data can be carried out with the new approach.</p>","PeriodicalId":55322,"journal":{"name":"British Journal of Mathematical & Statistical Psychology","volume":null,"pages":null},"PeriodicalIF":2.6,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9075644","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}