Pub Date : 2018-07-03DOI: 10.1080/15305058.2017.1396464
Amery Wu, Michelle Y. Chen, J. Stone
This article investigates how test-takers change their strategies to handle increased test difficulty. An adult sample reported their test-taking strategies immediately after completing the tasks in a reading test. Data were analyzed using structural equation modeling specifying a measurement-invariant, ability-moderated, latent transition analysis in Mplus (Muthén & Asparouhov, 2011). It was found that almost half of the test-takers (47%) changed their strategies when encountering increased task-difficulty. The changes were characterized by augmenting comprehending-meaning strategies with score-maximizing and test-wiseness strategies. Moreover, test-takers' ability was the driving influence that facilitated and/or buffered the changes. The test outcomes, when reviewed in light of adjusted test-taking strategies, demonstrated a form of process-based validity evidence.
{"title":"Investigating How Test-Takers Change Their Strategies to Handle Difficulty in Taking a Reading Comprehension Test: Implications for Score Validation","authors":"Amery Wu, Michelle Y. Chen, J. Stone","doi":"10.1080/15305058.2017.1396464","DOIUrl":"https://doi.org/10.1080/15305058.2017.1396464","url":null,"abstract":"This article investigates how test-takers change their strategies to handle increased test difficulty. An adult sample reported their test-taking strategies immediately after completing the tasks in a reading test. Data were analyzed using structural equation modeling specifying a measurement-invariant, ability-moderated, latent transition analysis in Mplus (Muthén & Asparouhov, 2011). It was found that almost half of the test-takers (47%) changed their strategies when encountering increased task-difficulty. The changes were characterized by augmenting comprehending-meaning strategies with score-maximizing and test-wiseness strategies. Moreover, test-takers' ability was the driving influence that facilitated and/or buffered the changes. The test outcomes, when reviewed in light of adjusted test-taking strategies, demonstrated a form of process-based validity evidence.","PeriodicalId":46615,"journal":{"name":"International Journal of Testing","volume":null,"pages":null},"PeriodicalIF":1.7,"publicationDate":"2018-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/15305058.2017.1396464","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45131489","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-07-03DOI: 10.1080/15305058.2017.1407767
Patriann Smith, P. Frazier, Jaehoon Lee, R. Chang
Previous research has primarily addressed the effects of language on the Program for International Student Assessment (PISA) mathematics and science assessments. More recent research has focused on the effects of language on PISA reading comprehension and literacy assessments on student populations in specific Organization for Economic Cooperation and Development (OECD) and non-OECD countries. Recognizing calls to highlight the impact of language on student PISA reading performance across countries, the purpose of this study was to examine the effect of home languages versus test languages on PISA reading literacy across OECD and non-OECD economies, while considering other factors. The results of Ordinary Least Squares regression showed that about half of the economies demonstrated a positive and significant effect of students' language status on their reading performance. This finding is consistent with observations in the parallel analysis of PISA 2009 data, suggesting that students' performance on reading literacy assessment was higher when they were tested in their home language. Our findings highlight the importance of the role of context, the need for new approaches to test translation, and the potential similarities in language status for youth from OECD and non-OECD countries that have implications for interpreting their PISA reading literacy assessments.
{"title":"Incongruence Between Native and Test Administration Languages: Towards Equal Opportunity in International Literacy Assessment","authors":"Patriann Smith, P. Frazier, Jaehoon Lee, R. Chang","doi":"10.1080/15305058.2017.1407767","DOIUrl":"https://doi.org/10.1080/15305058.2017.1407767","url":null,"abstract":"Previous research has primarily addressed the effects of language on the Program for International Student Assessment (PISA) mathematics and science assessments. More recent research has focused on the effects of language on PISA reading comprehension and literacy assessments on student populations in specific Organization for Economic Cooperation and Development (OECD) and non-OECD countries. Recognizing calls to highlight the impact of language on student PISA reading performance across countries, the purpose of this study was to examine the effect of home languages versus test languages on PISA reading literacy across OECD and non-OECD economies, while considering other factors. The results of Ordinary Least Squares regression showed that about half of the economies demonstrated a positive and significant effect of students' language status on their reading performance. This finding is consistent with observations in the parallel analysis of PISA 2009 data, suggesting that students' performance on reading literacy assessment was higher when they were tested in their home language. Our findings highlight the importance of the role of context, the need for new approaches to test translation, and the potential similarities in language status for youth from OECD and non-OECD countries that have implications for interpreting their PISA reading literacy assessments.","PeriodicalId":46615,"journal":{"name":"International Journal of Testing","volume":null,"pages":null},"PeriodicalIF":1.7,"publicationDate":"2018-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/15305058.2017.1407767","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41346188","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-04-03DOI: 10.1080/15305058.2017.1398166
The second edition of the International Test Commission Guidelines for Translating and Adapting Tests was prepared between 2005 and 2015 to improve upon the first edition, and to respond to advances in testing technology and practices. The 18 guidelines are organized into six categories to facilitate their use: pre-condition (3), test development (5), confirmation (4), administration (2), scoring and interpretation (2), and documentation (2). For each guideline, an explanation is provided along with suggestions for practice. A checklist is provided to improve the implementation of the guidelines.
{"title":"ITC Guidelines for Translating and Adapting Tests (Second Edition)","authors":"","doi":"10.1080/15305058.2017.1398166","DOIUrl":"https://doi.org/10.1080/15305058.2017.1398166","url":null,"abstract":"The second edition of the International Test Commission Guidelines for Translating and Adapting Tests was prepared between 2005 and 2015 to improve upon the first edition, and to respond to advances in testing technology and practices. The 18 guidelines are organized into six categories to facilitate their use: pre-condition (3), test development (5), confirmation (4), administration (2), scoring and interpretation (2), and documentation (2). For each guideline, an explanation is provided along with suggestions for practice. A checklist is provided to improve the implementation of the guidelines.","PeriodicalId":46615,"journal":{"name":"International Journal of Testing","volume":null,"pages":null},"PeriodicalIF":1.7,"publicationDate":"2018-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/15305058.2017.1398166","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49495890","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-04-03DOI: 10.1080/15305058.2017.1345913
Mengyang Cao, Q. Song, L. Tay
There is a growing use of noncognitive assessments around the world, and recent research has posited an ideal point response process underlying such measures. A critical issue is whether the typical use of dominance approaches (e.g., average scores, factor analysis, and the Samejima's graded response model) in scoring such measures is adequate. This study examined the performance of an ideal point scoring approach (e.g., the generalized graded unfolding model) as compared to the typical dominance scoring approaches in detecting curvilinear relationships between scored trait and external variable. Simulation results showed that when data followed the ideal point model, the ideal point approach generally exhibited more power and provided more accurate estimates of curvilinear effects than the dominance approaches. No substantial difference was found between ideal point and dominance scoring approaches in terms of Type I error rate and bias across different sample sizes and scale lengths, although skewness in the distribution of trait and external variable can potentially reduce statistical power. For dominance data, the ideal point scoring approach exhibited convergence problems in most conditions and failed to perform as well as the dominance scoring approaches. Practical implications for scoring responses to Likert-type surveys to examine curvilinear effects are discussed.
{"title":"Detecting Curvilinear Relationships: A Comparison of Scoring Approaches Based on Different Item Response Models","authors":"Mengyang Cao, Q. Song, L. Tay","doi":"10.1080/15305058.2017.1345913","DOIUrl":"https://doi.org/10.1080/15305058.2017.1345913","url":null,"abstract":"There is a growing use of noncognitive assessments around the world, and recent research has posited an ideal point response process underlying such measures. A critical issue is whether the typical use of dominance approaches (e.g., average scores, factor analysis, and the Samejima's graded response model) in scoring such measures is adequate. This study examined the performance of an ideal point scoring approach (e.g., the generalized graded unfolding model) as compared to the typical dominance scoring approaches in detecting curvilinear relationships between scored trait and external variable. Simulation results showed that when data followed the ideal point model, the ideal point approach generally exhibited more power and provided more accurate estimates of curvilinear effects than the dominance approaches. No substantial difference was found between ideal point and dominance scoring approaches in terms of Type I error rate and bias across different sample sizes and scale lengths, although skewness in the distribution of trait and external variable can potentially reduce statistical power. For dominance data, the ideal point scoring approach exhibited convergence problems in most conditions and failed to perform as well as the dominance scoring approaches. Practical implications for scoring responses to Likert-type surveys to examine curvilinear effects are discussed.","PeriodicalId":46615,"journal":{"name":"International Journal of Testing","volume":null,"pages":null},"PeriodicalIF":1.7,"publicationDate":"2018-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/15305058.2017.1345913","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49612645","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-02-28DOI: 10.1080/15305058.2018.1429446
K. Man, Jeffery R. Harring, Yunbo Ouyang, Sarah L. Thomas
Many important high-stakes decisions—college admission, academic performance evaluation, and even job promotion—depend on accurate and reliable scores from valid large-scale assessments. However, examinees sometimes cheat by copying answers from other test-takers or practicing with test items ahead of time, which can undermine the effectiveness of such assessments in yielding accurate, precise information of examinees' performances. This study focuses on the utility of a new nonparametric person-fit index using examinees' response times to detect two types of cheating behaviors. The feasibility of this method was investigated vis-à-vis a Monte Carlo simulation as well as through analyzing data from a large-scale assessment. Findings indicate that the proposed index was quite successful in detecting pre-knowledge cheating and extreme one-item cheating.
{"title":"Response Time Based Nonparametric Kullback-Leibler Divergence Measure for Detecting Aberrant Test-Taking Behavior","authors":"K. Man, Jeffery R. Harring, Yunbo Ouyang, Sarah L. Thomas","doi":"10.1080/15305058.2018.1429446","DOIUrl":"https://doi.org/10.1080/15305058.2018.1429446","url":null,"abstract":"Many important high-stakes decisions—college admission, academic performance evaluation, and even job promotion—depend on accurate and reliable scores from valid large-scale assessments. However, examinees sometimes cheat by copying answers from other test-takers or practicing with test items ahead of time, which can undermine the effectiveness of such assessments in yielding accurate, precise information of examinees' performances. This study focuses on the utility of a new nonparametric person-fit index using examinees' response times to detect two types of cheating behaviors. The feasibility of this method was investigated vis-à-vis a Monte Carlo simulation as well as through analyzing data from a large-scale assessment. Findings indicate that the proposed index was quite successful in detecting pre-knowledge cheating and extreme one-item cheating.","PeriodicalId":46615,"journal":{"name":"International Journal of Testing","volume":null,"pages":null},"PeriodicalIF":1.7,"publicationDate":"2018-02-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/15305058.2018.1429446","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47362564","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-02-27DOI: 10.1080/15305058.2018.1428980
S. Kim, Ki Cole, M. Mwavita
This study investigated the effects of linking potentially multidimensional test forms using the fixed item parameter calibration. Forms had equal or unequal total test difficulty with and without confounding difficulty. The mean square errors and bias of estimated item and ability parameters were compared across the various confounding tests. The estimated discrimination parameters were influenced by the levels of correlation between dimensions. The mean square errors (MSEs) of the average of the true discrimination parameters with the estimated value were smallest when the correlation equaled 0; however, the MSEs of the multidimensional discrimination parameter were smallest when the correlation was larger than 0. The estimated difficulty parameters were highly affected by different amount of confounding difficulty within dimensions. Furthermore, the MSEs of the average of the true ability parameters on the first and second dimensions with the estimated ability were smaller than those from the ability parameter on each dimension for all conditions. The pattern varied according to the number of common items, and the measures of MSE and squared bias were relatively consistent across forms at the same level of correlation, except for the condition where the correlation was 0 and the number of common items was 8.
{"title":"FIPC Linking Across Multidimensional Test Forms: Effects of Confounding Difficulty within Dimensions","authors":"S. Kim, Ki Cole, M. Mwavita","doi":"10.1080/15305058.2018.1428980","DOIUrl":"https://doi.org/10.1080/15305058.2018.1428980","url":null,"abstract":"This study investigated the effects of linking potentially multidimensional test forms using the fixed item parameter calibration. Forms had equal or unequal total test difficulty with and without confounding difficulty. The mean square errors and bias of estimated item and ability parameters were compared across the various confounding tests. The estimated discrimination parameters were influenced by the levels of correlation between dimensions. The mean square errors (MSEs) of the average of the true discrimination parameters with the estimated value were smallest when the correlation equaled 0; however, the MSEs of the multidimensional discrimination parameter were smallest when the correlation was larger than 0. The estimated difficulty parameters were highly affected by different amount of confounding difficulty within dimensions. Furthermore, the MSEs of the average of the true ability parameters on the first and second dimensions with the estimated ability were smaller than those from the ability parameter on each dimension for all conditions. The pattern varied according to the number of common items, and the measures of MSE and squared bias were relatively consistent across forms at the same level of correlation, except for the condition where the correlation was 0 and the number of common items was 8.","PeriodicalId":46615,"journal":{"name":"International Journal of Testing","volume":null,"pages":null},"PeriodicalIF":1.7,"publicationDate":"2018-02-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/15305058.2018.1428980","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"59952903","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-02-23DOI: 10.1080/15305058.2018.1428981
Michelle P. Martín‐Raugh, Cristina Anguiano-Carrsaco, Teresa Jackson, Meghan W. Brenneman, Lauren M. Carney, Patrick V. Barnwell, Jonathan F. Kochert
Single-response situational judgment tests (SRSJTs) differ from multiple-response SJTs (MRSJTS) in that they present test takers with edited critical incidents and simply ask test takers to read over the action described and evaluate it according to its effectiveness. Research comparing the reliability and validity of SRSJTs and MRSJTs is thus far extremely limited. The study reported here directly compares forms of a SRSJT and MRSJT and explores the reliability, convergent validity, and predictive validity of each format. Results from this investigation present preliminary evidence to suggest SRSJTs may produce internal consistency reliability, convergent validity, and predictive validity estimates that are comparable to those achieved with many traditional MRSJTs. We conclude by discussing practical implications for personnel selection and assessment, and future research in psychological science more broadly.
{"title":"Effects of Situational Judgment Test Format on Reliability and Validity","authors":"Michelle P. Martín‐Raugh, Cristina Anguiano-Carrsaco, Teresa Jackson, Meghan W. Brenneman, Lauren M. Carney, Patrick V. Barnwell, Jonathan F. Kochert","doi":"10.1080/15305058.2018.1428981","DOIUrl":"https://doi.org/10.1080/15305058.2018.1428981","url":null,"abstract":"Single-response situational judgment tests (SRSJTs) differ from multiple-response SJTs (MRSJTS) in that they present test takers with edited critical incidents and simply ask test takers to read over the action described and evaluate it according to its effectiveness. Research comparing the reliability and validity of SRSJTs and MRSJTs is thus far extremely limited. The study reported here directly compares forms of a SRSJT and MRSJT and explores the reliability, convergent validity, and predictive validity of each format. Results from this investigation present preliminary evidence to suggest SRSJTs may produce internal consistency reliability, convergent validity, and predictive validity estimates that are comparable to those achieved with many traditional MRSJTs. We conclude by discussing practical implications for personnel selection and assessment, and future research in psychological science more broadly.","PeriodicalId":46615,"journal":{"name":"International Journal of Testing","volume":null,"pages":null},"PeriodicalIF":1.7,"publicationDate":"2018-02-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/15305058.2018.1428981","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48386096","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-01-04DOI: 10.1080/15305058.2017.1407766
S. Papageorgiou, Ikkyu Choi
This study examined whether reporting subscores for groups of items within a test section assessing a second-language modality (specifically reading or listening comprehension) added value from a measurement perspective to the information already provided by the section scores. We analyzed the responses of 116,489 test takers to reading and listening items from operational administrations of two large-scale international tests of English as a foreign language. To “strengthen” the reliability of the subscores, and thus improve their added value, we applied a score augmentation method (Haberman, 2008). In doing so, our aim was to examine whether reporting augmented subscores for specific groups of reading and listening items could improve the added value of these subscores and consequently justify providing more fine-grained information about test taker performance. Our analysis indicated that in general, there was lack of support for reporting subscores from a psychometric perspective, and that score augmentation marginally improved the added value of the subscores. We discuss several implications of our findings for test developers wishing to report more fine-grained information about test performance. We conclude by arguing that research on how to best report such refined feedback should remain the focus of future efforts related to second-language proficiency tests.
{"title":"Adding Value to Second-Language Listening and Reading Subscores: Using a Score Augmentation Approach","authors":"S. Papageorgiou, Ikkyu Choi","doi":"10.1080/15305058.2017.1407766","DOIUrl":"https://doi.org/10.1080/15305058.2017.1407766","url":null,"abstract":"This study examined whether reporting subscores for groups of items within a test section assessing a second-language modality (specifically reading or listening comprehension) added value from a measurement perspective to the information already provided by the section scores. We analyzed the responses of 116,489 test takers to reading and listening items from operational administrations of two large-scale international tests of English as a foreign language. To “strengthen” the reliability of the subscores, and thus improve their added value, we applied a score augmentation method (Haberman, 2008). In doing so, our aim was to examine whether reporting augmented subscores for specific groups of reading and listening items could improve the added value of these subscores and consequently justify providing more fine-grained information about test taker performance. Our analysis indicated that in general, there was lack of support for reporting subscores from a psychometric perspective, and that score augmentation marginally improved the added value of the subscores. We discuss several implications of our findings for test developers wishing to report more fine-grained information about test performance. We conclude by arguing that research on how to best report such refined feedback should remain the focus of future efforts related to second-language proficiency tests.","PeriodicalId":46615,"journal":{"name":"International Journal of Testing","volume":null,"pages":null},"PeriodicalIF":1.7,"publicationDate":"2018-01-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/15305058.2017.1407766","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43463605","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-01-02DOI: 10.1080/15305058.2017.1396463
A. Huggins-Manley, Yuxi Qiu, Randall D. Penfield
Score equity assessment (SEA) refers to an examination of population invariance of equating across two or more subpopulations of test examinees. Previous SEA studies have shown that score equity may be present for examinees scoring at particular test score ranges but absent for examinees scoring at other score ranges. No studies to date have performed research for the purpose of understanding why score equity can be inconsistent across the score range of some tests. The purpose of this study is to explore a source of uneven subpopulation score equity across the score range of a test. It is hypothesized that the difficulty of anchor items displaying differential item functioning (DIF) is directly related to the score location at which issues of score inequity are observed. The simulation study supports the hypothesis that the difficulty of DIF items has a systematic impact on the uneven nature of conditional score equity.
{"title":"Exploring a Source of Uneven Score Equity across the Test Score Range","authors":"A. Huggins-Manley, Yuxi Qiu, Randall D. Penfield","doi":"10.1080/15305058.2017.1396463","DOIUrl":"https://doi.org/10.1080/15305058.2017.1396463","url":null,"abstract":"Score equity assessment (SEA) refers to an examination of population invariance of equating across two or more subpopulations of test examinees. Previous SEA studies have shown that score equity may be present for examinees scoring at particular test score ranges but absent for examinees scoring at other score ranges. No studies to date have performed research for the purpose of understanding why score equity can be inconsistent across the score range of some tests. The purpose of this study is to explore a source of uneven subpopulation score equity across the score range of a test. It is hypothesized that the difficulty of anchor items displaying differential item functioning (DIF) is directly related to the score location at which issues of score inequity are observed. The simulation study supports the hypothesis that the difficulty of DIF items has a systematic impact on the uneven nature of conditional score equity.","PeriodicalId":46615,"journal":{"name":"International Journal of Testing","volume":null,"pages":null},"PeriodicalIF":1.7,"publicationDate":"2018-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/15305058.2017.1396463","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45374977","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-01-02DOI: 10.1080/15305058.2017.1312408
S. Şen
Recent research has shown that over-extraction of latent classes can be observed in the Bayesian estimation of the mixed Rasch model when the distribution of ability is non-normal. This study examined the effect of non-normal ability distributions on the number of latent classes in the mixed Rasch model when estimated with maximum likelihood estimation methods (conditional, marginal, and joint). Three information criteria fit indices (Akaike information criterion, Bayesian information criterion, and sample size adjusted BIC) were used in a simulation study and an empirical study. Findings of this study showed that the spurious latent class problem was observed with marginal maximum likelihood and joint maximum likelihood estimations. However, conditional maximum likelihood estimation showed no overextraction problem with non-normal ability distributions.
{"title":"Spurious Latent Class Problem in the Mixed Rasch Model: A Comparison of Three Maximum Likelihood Estimation Methods under Different Ability Distributions","authors":"S. Şen","doi":"10.1080/15305058.2017.1312408","DOIUrl":"https://doi.org/10.1080/15305058.2017.1312408","url":null,"abstract":"Recent research has shown that over-extraction of latent classes can be observed in the Bayesian estimation of the mixed Rasch model when the distribution of ability is non-normal. This study examined the effect of non-normal ability distributions on the number of latent classes in the mixed Rasch model when estimated with maximum likelihood estimation methods (conditional, marginal, and joint). Three information criteria fit indices (Akaike information criterion, Bayesian information criterion, and sample size adjusted BIC) were used in a simulation study and an empirical study. Findings of this study showed that the spurious latent class problem was observed with marginal maximum likelihood and joint maximum likelihood estimations. However, conditional maximum likelihood estimation showed no overextraction problem with non-normal ability distributions.","PeriodicalId":46615,"journal":{"name":"International Journal of Testing","volume":null,"pages":null},"PeriodicalIF":1.7,"publicationDate":"2018-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/15305058.2017.1312408","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48081560","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}