Pub Date : 2022-05-01DOI: 10.1177/01466216221084371
Joseph A Rios
Rapid guessing (RG) behavior can undermine measurement property and score-based inferences. To mitigate this potential bias, practitioners have relied on response time information to identify and filter RG responses. However, response times may be unavailable in many testing contexts, such as paper-and-pencil administrations. When this is the case, self-report measures of effort and person-fit statistics have been used. These methods are limited in that inferences concerning motivation and aberrant responding are made at the examinee level. As test takers can engage in a mixture of solution and RG behavior throughout a test administration, there is a need to limit the influence of potential aberrant responses at the item level. This can be done by employing robust estimation procedures. Since these estimators have received limited attention in the RG literature, the objective of this simulation study was to evaluate ability parameter estimation accuracy in the presence of RG by comparing maximum likelihood estimation (MLE) to two robust variants, the bisquare and Huber estimators. Two RG conditions were manipulated, RG percentage (10%, 20%, and 40%) and pattern (difficulty-based and changing state). Contrasted to the MLE procedure, results demonstrated that both the bisquare and Huber estimators reduced bias in ability parameter estimates by as much as 94%. Given that the Huber estimator showed smaller standard deviations of error and performed equally as well as the bisquare approach under most conditions, it is recommended as a promising approach to mitigating bias from RG when response time information is unavailable.
{"title":"A Comparison of Robust Likelihood Estimators to Mitigate Bias From Rapid Guessing.","authors":"Joseph A Rios","doi":"10.1177/01466216221084371","DOIUrl":"https://doi.org/10.1177/01466216221084371","url":null,"abstract":"<p><p>Rapid guessing (RG) behavior can undermine measurement property and score-based inferences. To mitigate this potential bias, practitioners have relied on response time information to identify and filter RG responses. However, response times may be unavailable in many testing contexts, such as paper-and-pencil administrations. When this is the case, self-report measures of effort and person-fit statistics have been used. These methods are limited in that inferences concerning motivation and aberrant responding are made at the examinee level. As test takers can engage in a mixture of solution and RG behavior throughout a test administration, there is a need to limit the influence of potential aberrant responses at the item level. This can be done by employing robust estimation procedures. Since these estimators have received limited attention in the RG literature, the objective of this simulation study was to evaluate ability parameter estimation accuracy in the presence of RG by comparing maximum likelihood estimation (MLE) to two robust variants, the bisquare and Huber estimators. Two RG conditions were manipulated, RG percentage (10%, 20%, and 40%) and pattern (difficulty-based and changing state). Contrasted to the MLE procedure, results demonstrated that both the bisquare and Huber estimators reduced bias in ability parameter estimates by as much as 94%. Given that the Huber estimator showed smaller standard deviations of error and performed equally as well as the bisquare approach under most conditions, it is recommended as a promising approach to mitigating bias from RG when response time information is unavailable.</p>","PeriodicalId":48300,"journal":{"name":"Applied Psychological Measurement","volume":"46 3","pages":"236-249"},"PeriodicalIF":1.2,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9073634/pdf/10.1177_01466216221084371.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9748240","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-05-01DOI: 10.1177/01466216221084219
Man-Suk Oh, Eun-Kyung Lee
MDSIC computes and plots MDSIC that can be used to select optimal number of dimensions for a given data set. There are also a few plot functions. plotObj shows pairwise scatter plots of object con fi guration in a Euclidean space for the fi rst three dimensions. plotTrace provides trace plots of parameter samples for visual inspection of MCMC convergence. plotDelDist plots the observed dissimilarity measures versus Euclidean distances computed from BMDS object con fi guration. bayMDSApp shows the results of bayMDS in a web-based GUI (graphical user
{"title":"BayMDS: An R Package for Bayesian Multidimensional Scaling and Choice of Dimension.","authors":"Man-Suk Oh, Eun-Kyung Lee","doi":"10.1177/01466216221084219","DOIUrl":"https://doi.org/10.1177/01466216221084219","url":null,"abstract":"MDSIC computes and plots MDSIC that can be used to select optimal number of dimensions for a given data set. There are also a few plot functions. plotObj shows pairwise scatter plots of object con fi guration in a Euclidean space for the fi rst three dimensions. plotTrace provides trace plots of parameter samples for visual inspection of MCMC convergence. plotDelDist plots the observed dissimilarity measures versus Euclidean distances computed from BMDS object con fi guration. bayMDSApp shows the results of bayMDS in a web-based GUI (graphical user","PeriodicalId":48300,"journal":{"name":"Applied Psychological Measurement","volume":"46 3","pages":"250-251"},"PeriodicalIF":1.2,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9073637/pdf/10.1177_01466216221084219.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9748237","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-04-18DOI: 10.1177/01466216221084208
M. Bolsinova, Benjamin E. Deonovic, Meirav Arieli-Attali, Burr Settles, Masato Hagiwara, G. Maris
Adaptive learning and assessment systems support learners in acquiring knowledge and skills in a particular domain. The learners’ progress is monitored through them solving items matching their level and aiming at specific learning goals. Scaffolding and providing learners with hints are powerful tools in helping the learning process. One way of introducing hints is to make hint use the choice of the student. When the learner is certain of their response, they answer without hints, but if the learner is not certain or does not know how to approach the item they can request a hint. We develop measurement models for applications where such on-demand hints are available. Such models take into account that hint use may be informative of ability, but at the same time may be influenced by other individual characteristics. Two modeling strategies are considered: (1) The measurement model is based on a scoring rule for ability which includes both response accuracy and hint use. (2) The choice to use hints and response accuracy conditional on this choice are modeled jointly using Item Response Tree models. The properties of different models and their implications are discussed. An application to data from Duolingo, an adaptive language learning system, is presented. Here, the best model is the scoring-rule-based model with full credit for correct responses without hints, partial credit for correct responses with hints, and no credit for all incorrect responses. The second dimension in the model accounts for the individual differences in the tendency to use hints.
{"title":"Measurement of Ability in Adaptive Learning and Assessment Systems when Learners Use On-Demand Hints","authors":"M. Bolsinova, Benjamin E. Deonovic, Meirav Arieli-Attali, Burr Settles, Masato Hagiwara, G. Maris","doi":"10.1177/01466216221084208","DOIUrl":"https://doi.org/10.1177/01466216221084208","url":null,"abstract":"Adaptive learning and assessment systems support learners in acquiring knowledge and skills in a particular domain. The learners’ progress is monitored through them solving items matching their level and aiming at specific learning goals. Scaffolding and providing learners with hints are powerful tools in helping the learning process. One way of introducing hints is to make hint use the choice of the student. When the learner is certain of their response, they answer without hints, but if the learner is not certain or does not know how to approach the item they can request a hint. We develop measurement models for applications where such on-demand hints are available. Such models take into account that hint use may be informative of ability, but at the same time may be influenced by other individual characteristics. Two modeling strategies are considered: (1) The measurement model is based on a scoring rule for ability which includes both response accuracy and hint use. (2) The choice to use hints and response accuracy conditional on this choice are modeled jointly using Item Response Tree models. The properties of different models and their implications are discussed. An application to data from Duolingo, an adaptive language learning system, is presented. Here, the best model is the scoring-rule-based model with full credit for correct responses without hints, partial credit for correct responses with hints, and no credit for all incorrect responses. The second dimension in the model accounts for the individual differences in the tendency to use hints.","PeriodicalId":48300,"journal":{"name":"Applied Psychological Measurement","volume":"46 1","pages":"219 - 235"},"PeriodicalIF":1.2,"publicationDate":"2022-04-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43517867","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-04-15DOI: 10.1177/01466216221084215
Björn Andersson, Hao Luo
Assessing multidimensionality of a scale or test is a staple of educational and psychological measurement. One approach to evaluate approximate unidimensionality is to fit a bifactor model where the subfactors are determined by substantive theory and estimate the explained common variance (ECV) of the general factor. The ECV says to what extent the explained variance is dominated by the general factor over the specific factors, and has been used, together with other methods and statistics, to determine if a single factor model is sufficient for analyzing a scale or test (Rodriguez et al., 2016). In addition, the individual item-ECV (I-ECV) has been used to assess approximate unidimensionality of individual items (Carnovale et al., 2021; Stucky et al., 2013). However, the ECVand I-ECVare subject to random estimation error which previous studies have not considered. Not accounting for the error in estimation can lead to conclusions regarding the dimensionality of a scale or item that are inaccurate, especially when an estimate of ECVor I-ECV is compared to a pre-specified cut-off value to evaluate unidimensionality. The objective of the present study is to derive standard errors of the estimators of ECV and I-ECV with linear confirmatory factor analysis (CFA) models to enable the assessment of random estimation error and the computation of confidence intervals for the parameters. We use Monte-Carlo simulation to assess the accuracy of the derived standard errors and evaluate the impact of sampling variability on the estimation of the ECV and I-ECV. In a bifactor model for J items, denote Xj, j 1⁄4 1, ..., J , as the observed variable and let G denote the general factor. We define the S subfactors Fs, s2f1,..., Sg, and Js as the set of indicators for each subfactor. Each observed indicator Xj is then defined by the multiple factor model (McDonald, 2013)
{"title":"Impact of Sampling Variability When Estimating the Explained Common Variance","authors":"Björn Andersson, Hao Luo","doi":"10.1177/01466216221084215","DOIUrl":"https://doi.org/10.1177/01466216221084215","url":null,"abstract":"Assessing multidimensionality of a scale or test is a staple of educational and psychological measurement. One approach to evaluate approximate unidimensionality is to fit a bifactor model where the subfactors are determined by substantive theory and estimate the explained common variance (ECV) of the general factor. The ECV says to what extent the explained variance is dominated by the general factor over the specific factors, and has been used, together with other methods and statistics, to determine if a single factor model is sufficient for analyzing a scale or test (Rodriguez et al., 2016). In addition, the individual item-ECV (I-ECV) has been used to assess approximate unidimensionality of individual items (Carnovale et al., 2021; Stucky et al., 2013). However, the ECVand I-ECVare subject to random estimation error which previous studies have not considered. Not accounting for the error in estimation can lead to conclusions regarding the dimensionality of a scale or item that are inaccurate, especially when an estimate of ECVor I-ECV is compared to a pre-specified cut-off value to evaluate unidimensionality. The objective of the present study is to derive standard errors of the estimators of ECV and I-ECV with linear confirmatory factor analysis (CFA) models to enable the assessment of random estimation error and the computation of confidence intervals for the parameters. We use Monte-Carlo simulation to assess the accuracy of the derived standard errors and evaluate the impact of sampling variability on the estimation of the ECV and I-ECV. In a bifactor model for J items, denote Xj, j 1⁄4 1, ..., J , as the observed variable and let G denote the general factor. We define the S subfactors Fs, s2f1,..., Sg, and Js as the set of indicators for each subfactor. Each observed indicator Xj is then defined by the multiple factor model (McDonald, 2013)","PeriodicalId":48300,"journal":{"name":"Applied Psychological Measurement","volume":"46 1","pages":"338 - 341"},"PeriodicalIF":1.2,"publicationDate":"2022-04-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42137052","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-03-07DOI: 10.1177/01466216211066601
Kseniia Marcq, Björn Andersson
In standardized testing, equating is used to ensure comparability of test scores across multiple test administrations. One equipercentile observed-score equating method is kernel equating, where an essential step is to obtain continuous approximations to the discrete score distributions by applying a kernel with a smoothing bandwidth parameter. When estimating the bandwidth, additional variability is introduced which is currently not accounted for when calculating the standard errors of equating. This poses a threat to the accuracy of the standard errors of equating. In this study, the asymptotic variance of the bandwidth parameter estimator is derived and a modified method for calculating the standard error of equating that accounts for the bandwidth estimation variability is introduced for the equivalent groups design. A simulation study is used to verify the derivations and confirm the accuracy of the modified method across several sample sizes and test lengths as compared to the existing method and the Monte Carlo standard error of equating estimates. The results show that the modified standard errors of equating are accurate under the considered conditions. Furthermore, the modified and the existing methods produce similar results which suggest that the bandwidth variability impact on the standard error of equating is minimal.
{"title":"Standard Errors of Kernel Equating: Accounting for Bandwidth Estimation","authors":"Kseniia Marcq, Björn Andersson","doi":"10.1177/01466216211066601","DOIUrl":"https://doi.org/10.1177/01466216211066601","url":null,"abstract":"In standardized testing, equating is used to ensure comparability of test scores across multiple test administrations. One equipercentile observed-score equating method is kernel equating, where an essential step is to obtain continuous approximations to the discrete score distributions by applying a kernel with a smoothing bandwidth parameter. When estimating the bandwidth, additional variability is introduced which is currently not accounted for when calculating the standard errors of equating. This poses a threat to the accuracy of the standard errors of equating. In this study, the asymptotic variance of the bandwidth parameter estimator is derived and a modified method for calculating the standard error of equating that accounts for the bandwidth estimation variability is introduced for the equivalent groups design. A simulation study is used to verify the derivations and confirm the accuracy of the modified method across several sample sizes and test lengths as compared to the existing method and the Monte Carlo standard error of equating estimates. The results show that the modified standard errors of equating are accurate under the considered conditions. Furthermore, the modified and the existing methods produce similar results which suggest that the bandwidth variability impact on the standard error of equating is minimal.","PeriodicalId":48300,"journal":{"name":"Applied Psychological Measurement","volume":"46 1","pages":"200 - 218"},"PeriodicalIF":1.2,"publicationDate":"2022-03-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49283258","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-03-01Epub Date: 2022-01-09DOI: 10.1177/01466216211063233
Zuchao Shen, Walter L Leite
{"title":"SEMsens: An R Package for Sensitivity Analysis of Structural Equation Models With the Ant Colony Optimization Algorithm.","authors":"Zuchao Shen, Walter L Leite","doi":"10.1177/01466216211063233","DOIUrl":"10.1177/01466216211063233","url":null,"abstract":"","PeriodicalId":48300,"journal":{"name":"Applied Psychological Measurement","volume":"46 2","pages":"159-161"},"PeriodicalIF":1.0,"publicationDate":"2022-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8908408/pdf/10.1177_01466216211063233.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10810177","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-03-01Epub Date: 2022-02-13DOI: 10.1177/01466216211066603
Benjamin A Stenhaug, Benjamin W Domingue
The fit of an item response model is typically conceptualized as whether a given model could have generated the data. In this study, for an alternative view of fit, "predictive fit," based on the model's ability to predict new data is advocated. The authors define two prediction tasks: "missing responses prediction"-where the goal is to predict an in-sample person's response to an in-sample item-and "missing persons prediction"-where the goal is to predict an out-of-sample person's string of responses. Based on these prediction tasks, two predictive fit metrics are derived for item response models that assess how well an estimated item response model fits the data-generating model. These metrics are based on long-run out-of-sample predictive performance (i.e., if the data-generating model produced infinite amounts of data, what is the quality of a "model's predictions on average?"). Simulation studies are conducted to identify the prediction-maximizing model across a variety of conditions. For example, defining prediction in terms of missing responses, greater average person ability, and greater item discrimination are all associated with the 3PL model producing relatively worse predictions, and thus lead to greater minimum sample sizes for the 3PL model. In each simulation, the prediction-maximizing model to the model selected by Akaike's information criterion, Bayesian information criterion (BIC), and likelihood ratio tests are compared. It is found that performance of these methods depends on the prediction task of interest. In general, likelihood ratio tests often select overly flexible models, while BIC selects overly parsimonious models. The authors use Programme for International Student Assessment data to demonstrate how to use cross-validation to directly estimate the predictive fit metrics in practice. The implications for item response model selection in operational settings are discussed.
{"title":"Predictive Fit Metrics for Item Response Models.","authors":"Benjamin A Stenhaug, Benjamin W Domingue","doi":"10.1177/01466216211066603","DOIUrl":"10.1177/01466216211066603","url":null,"abstract":"<p><p>The fit of an item response model is typically conceptualized as whether a given model could have generated the data. In this study, for an alternative view of fit, \"predictive fit,\" based on the model's ability to predict new data is advocated. The authors define two prediction tasks: \"missing responses prediction\"-where the goal is to predict an in-sample person's response to an in-sample item-and \"missing persons prediction\"-where the goal is to predict an out-of-sample person's string of responses. Based on these prediction tasks, two predictive fit metrics are derived for item response models that assess how well an estimated item response model fits the data-generating model. These metrics are based on long-run out-of-sample predictive performance (i.e., if the data-generating model produced infinite amounts of data, what is the quality of a \"model's predictions on average?\"). Simulation studies are conducted to identify the prediction-maximizing model across a variety of conditions. For example, defining prediction in terms of missing responses, greater average person ability, and greater item discrimination are all associated with the 3PL model producing relatively worse predictions, and thus lead to greater minimum sample sizes for the 3PL model. In each simulation, the prediction-maximizing model to the model selected by Akaike's information criterion, Bayesian information criterion (BIC), and likelihood ratio tests are compared. It is found that performance of these methods depends on the prediction task of interest. In general, likelihood ratio tests often select overly flexible models, while BIC selects overly parsimonious models. The authors use Programme for International Student Assessment data to demonstrate how to use cross-validation to directly estimate the predictive fit metrics in practice. The implications for item response model selection in operational settings are discussed.</p>","PeriodicalId":48300,"journal":{"name":"Applied Psychological Measurement","volume":"46 2","pages":"136-155"},"PeriodicalIF":1.0,"publicationDate":"2022-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8908407/pdf/10.1177_01466216211066603.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10810179","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-03-01DOI: 10.1177/01466216211066609
Ray E Reichenberg, Roy Levy, Adam Clark
Dynamic Bayesian networks (DBNs; Reye, 2004) are a promising tool for modeling student proficiency under rich measurement scenarios (Reichenberg, 2018). These scenarios often present assessment conditions far more complex than what is seen with more traditional assessments and require assessment arguments and psychometric models capable of integrating those complexities. Unfortunately, DBNs remain understudied and their psychometric properties relatively unknown. The current work aimed at exploring the properties of DBNs under a variety of realistic psychometric conditions. A Monte Carlo simulation study was conducted in order to evaluate parameter recovery for DBNs using maximum likelihood estimation. Manipulated factors included sample size, measurement quality, test length, the number of measurement occasions. Results suggested that measurement quality has the most prominent impact on estimation quality with more distinct performance categories yielding better estimation. From a practical perspective, parameter recovery appeared to be sufficient with samples as low as N = 400 as long as measurement quality was not poor and at least three items were present at each measurement occasion. Tests consisting of only a single item required exceptional measurement quality in order to adequately recover model parameters.
{"title":"Considerations for Fitting Dynamic Bayesian Networks With Latent Variables: A Monte Carlo Study.","authors":"Ray E Reichenberg, Roy Levy, Adam Clark","doi":"10.1177/01466216211066609","DOIUrl":"https://doi.org/10.1177/01466216211066609","url":null,"abstract":"<p><p>Dynamic Bayesian networks (DBNs; Reye, 2004) are a promising tool for modeling student proficiency under rich measurement scenarios (Reichenberg, 2018). These scenarios often present assessment conditions far more complex than what is seen with more traditional assessments and require assessment arguments and psychometric models capable of integrating those complexities. Unfortunately, DBNs remain understudied and their psychometric properties relatively unknown. The current work aimed at exploring the properties of DBNs under a variety of realistic psychometric conditions. A Monte Carlo simulation study was conducted in order to evaluate parameter recovery for DBNs using maximum likelihood estimation. Manipulated factors included sample size, measurement quality, test length, the number of measurement occasions. Results suggested that measurement quality has the most prominent impact on estimation quality with more distinct performance categories yielding better estimation. From a practical perspective, parameter recovery appeared to be sufficient with samples as low as <i>N</i> = 400 as long as measurement quality was not poor and at least three items were present at each measurement occasion. Tests consisting of only a single item required exceptional measurement quality in order to adequately recover model parameters.</p>","PeriodicalId":48300,"journal":{"name":"Applied Psychological Measurement","volume":"46 2","pages":"116-135"},"PeriodicalIF":1.2,"publicationDate":"2022-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8908410/pdf/10.1177_01466216211066609.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10615071","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-03-01DOI: 10.1177/01466216211066606
Seang-Hwane Joo, Philseok Lee, Stephen Stark
Differential item functioning (DIF) analysis is one of the most important applications of item response theory (IRT) in psychological assessment. This study examined the performance of two Bayesian DIF methods, Bayes factor (BF) and deviance information criterion (DIC), with the generalized graded unfolding model (GGUM). The Type I error and power were investigated in a Monte Carlo simulation that manipulated sample size, DIF source, DIF size, DIF location, subpopulation trait distribution, and type of baseline model. We also examined the performance of two likelihood-based methods, the likelihood ratio (LR) test and Akaike information criterion (AIC), using marginal maximum likelihood (MML) estimation for comparison with past DIF research. The results indicated that the proposed BF and DIC methods provided well-controlled Type I error and high power using a free-baseline model implementation, their performance was superior to LR and AIC in terms of Type I error rates when the reference and focal group trait distributions differed. The implications and recommendations for applied research are discussed.
{"title":"Bayesian Approaches for Detecting Differential Item Functioning Using the Generalized Graded Unfolding Model.","authors":"Seang-Hwane Joo, Philseok Lee, Stephen Stark","doi":"10.1177/01466216211066606","DOIUrl":"https://doi.org/10.1177/01466216211066606","url":null,"abstract":"<p><p>Differential item functioning (DIF) analysis is one of the most important applications of item response theory (IRT) in psychological assessment. This study examined the performance of two Bayesian DIF methods, Bayes factor (BF) and deviance information criterion (DIC), with the generalized graded unfolding model (GGUM). The Type I error and power were investigated in a Monte Carlo simulation that manipulated sample size, DIF source, DIF size, DIF location, subpopulation trait distribution, and type of baseline model. We also examined the performance of two likelihood-based methods, the likelihood ratio (LR) test and Akaike information criterion (AIC), using marginal maximum likelihood (MML) estimation for comparison with past DIF research. The results indicated that the proposed BF and DIC methods provided well-controlled Type I error and high power using a free-baseline model implementation, their performance was superior to LR and AIC in terms of Type I error rates when the reference and focal group trait distributions differed. The implications and recommendations for applied research are discussed.</p>","PeriodicalId":48300,"journal":{"name":"Applied Psychological Measurement","volume":"46 2","pages":"98-115"},"PeriodicalIF":1.2,"publicationDate":"2022-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8908411/pdf/10.1177_01466216211066606.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10800335","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-03-01DOI: 10.1177/01466216211063234
Seonghoon Kim, Michael J Kolen
In their 2005 paper, Li and her colleagues proposed a test response function (TRF) linking method for a two-parameter testlet model and used a genetic algorithm to find minimization solutions for the linking coefficients. In the present paper the linking task for a three-parameter testlet model is formulated from the perspective of bi-factor modeling, and three linking methods for the model are presented: the TRF, mean/least squares (MLS), and item response function (IRF) methods. Simulations are conducted to compare the TRF method using a genetic algorithm with the TRF and IRF methods using a quasi-Newton algorithm and the MLS method. The results indicate that the IRF, MLS, and TRF methods perform very well, well, and poorly, respectively, in estimating the linking coefficients associated with testlet effects, that the use of genetic algorithms offers little improvement to the TRF method, and that the minimization function for the TRF method is not as well-structured as that for the IRF method.
在2005年的论文中,Li和她的同事提出了一种双参数测试模型的测试响应函数(TRF)连接方法,并使用遗传算法找到连接系数的最小化解。本文从双因素建模的角度出发,提出了三参数测试集模型的链接任务,并提出了三种链接模型的方法:TRF、mean/least squares (MLS)和item response function (IRF)方法。通过仿真比较了基于遗传算法的TRF方法与基于准牛顿算法和MLS方法的TRF和IRF方法。结果表明,IRF、MLS和TRF方法在估计与测试集效应相关的连接系数方面分别表现得很好、很好和很差,遗传算法的使用对TRF方法的改进很小,并且TRF方法的最小化函数不如IRF方法结构良好。
{"title":"Scale Linking for the Testlet Item Response Theory Model.","authors":"Seonghoon Kim, Michael J Kolen","doi":"10.1177/01466216211063234","DOIUrl":"https://doi.org/10.1177/01466216211063234","url":null,"abstract":"<p><p>In their 2005 paper, Li and her colleagues proposed a test response function (TRF) linking method for a two-parameter testlet model and used a genetic algorithm to find minimization solutions for the linking coefficients. In the present paper the linking task for a three-parameter testlet model is formulated from the perspective of bi-factor modeling, and three linking methods for the model are presented: the TRF, mean/least squares (MLS), and item response function (IRF) methods. Simulations are conducted to compare the TRF method using a genetic algorithm with the TRF and IRF methods using a quasi-Newton algorithm and the MLS method. The results indicate that the IRF, MLS, and TRF methods perform very well, well, and poorly, respectively, in estimating the linking coefficients associated with testlet effects, that the use of genetic algorithms offers little improvement to the TRF method, and that the minimization function for the TRF method is not as well-structured as that for the IRF method.</p>","PeriodicalId":48300,"journal":{"name":"Applied Psychological Measurement","volume":"46 2","pages":"79-97"},"PeriodicalIF":1.2,"publicationDate":"2022-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8908412/pdf/10.1177_01466216211063234.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10810181","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}