Pub Date : 2021-10-02DOI: 10.1080/08957347.2021.1987901
Chunyan Liu, D. Jurich, C. Morrison, Irina Grabovsky
ABSTRACT The existence of outliers in the anchor items can be detrimental to the estimation of examinee ability and undermine the validity of score interpretation across forms. However, in practice, anchor item performance can become distorted due to various reasons. This study compares the performance of modified INFIT and OUTFIT Rasch statistics with the Logit Difference approach with 0.3 and 0.5 as the predetermined cutoff values, and the Robust z statistic with 1.645 and 2.7 as the cutoff values through a simulation study by varying the sample size, proportion of outliers, item difficulty drift direction, and group difference magnitude. The results suggest that both modified INFIT and OUTFIT statistics perform very similarly and outperform the other methods in all aspects, including sensitivity of flagging outliers, specificity of flagging non-outliers, recovery of translation constant, and recovery of examinee ability in all simulated conditions.
{"title":"Detection of Outliers in Anchor Items Using Modified Rasch Fit Statistics","authors":"Chunyan Liu, D. Jurich, C. Morrison, Irina Grabovsky","doi":"10.1080/08957347.2021.1987901","DOIUrl":"https://doi.org/10.1080/08957347.2021.1987901","url":null,"abstract":"ABSTRACT The existence of outliers in the anchor items can be detrimental to the estimation of examinee ability and undermine the validity of score interpretation across forms. However, in practice, anchor item performance can become distorted due to various reasons. This study compares the performance of modified INFIT and OUTFIT Rasch statistics with the Logit Difference approach with 0.3 and 0.5 as the predetermined cutoff values, and the Robust z statistic with 1.645 and 2.7 as the cutoff values through a simulation study by varying the sample size, proportion of outliers, item difficulty drift direction, and group difference magnitude. The results suggest that both modified INFIT and OUTFIT statistics perform very similarly and outperform the other methods in all aspects, including sensitivity of flagging outliers, specificity of flagging non-outliers, recovery of translation constant, and recovery of examinee ability in all simulated conditions.","PeriodicalId":51609,"journal":{"name":"Applied Measurement in Education","volume":"34 1","pages":"327 - 341"},"PeriodicalIF":1.5,"publicationDate":"2021-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47230980","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-07-03DOI: 10.1080/08957347.2021.1933980
Hongli Li, Jacquelyn A. Bialo, Yao Xiong, C. Hunter, Xiuyan Guo
ABSTRACT Peer assessment is increasingly being used as a pedagogical tool in classrooms. Participating in peer assessment may enhance student learning in both cognitive and non-cognitive aspects. In this study, we focused on non-cognitive aspects by performing a meta-analysis to synthesize the effect of peer assessment on students’ non-cognitive learning outcomes. After a systematic search, we included 43 effect sizes from 19 studies, which mostly involved learning strategies and academic mind-sets as non-cognitive outcomes. Using a random effects model, we found that students who had participated in peer assessment showed a 0.289 standard deviation unit improvement in non-cognitive outcomes as compared to students who had not participated in peer assessment. Further, we found that the effect of peer assessment on non-cognitive outcomes was significantly larger when both scores and comments were provided to students or when assessors and assessees were matched at random. Our findings can be used as a basis for further investigation into how best to use peer assessment as a learning tool, especially to promote non-cognitive development.
{"title":"The Effect of Peer Assessment on Non-Cognitive Outcomes: A Meta-Analysis","authors":"Hongli Li, Jacquelyn A. Bialo, Yao Xiong, C. Hunter, Xiuyan Guo","doi":"10.1080/08957347.2021.1933980","DOIUrl":"https://doi.org/10.1080/08957347.2021.1933980","url":null,"abstract":"ABSTRACT Peer assessment is increasingly being used as a pedagogical tool in classrooms. Participating in peer assessment may enhance student learning in both cognitive and non-cognitive aspects. In this study, we focused on non-cognitive aspects by performing a meta-analysis to synthesize the effect of peer assessment on students’ non-cognitive learning outcomes. After a systematic search, we included 43 effect sizes from 19 studies, which mostly involved learning strategies and academic mind-sets as non-cognitive outcomes. Using a random effects model, we found that students who had participated in peer assessment showed a 0.289 standard deviation unit improvement in non-cognitive outcomes as compared to students who had not participated in peer assessment. Further, we found that the effect of peer assessment on non-cognitive outcomes was significantly larger when both scores and comments were provided to students or when assessors and assessees were matched at random. Our findings can be used as a basis for further investigation into how best to use peer assessment as a learning tool, especially to promote non-cognitive development.","PeriodicalId":51609,"journal":{"name":"Applied Measurement in Education","volume":"34 1","pages":"179 - 203"},"PeriodicalIF":1.5,"publicationDate":"2021-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/08957347.2021.1933980","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44728944","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-07-03DOI: 10.1080/08957347.2021.1933982
J. H. Lozano, J. Revuelta
ABSTRACT The present study proposes a Bayesian approach for estimating and testing the operation-specific learning model, a variant of the linear logistic test model that allows for the measurement of the learning that occurs during a test as a result of the repeated use of the operations involved in the items. The advantages of using a Bayesian framework compared to the traditional frequentist approach are discussed. The application of the model is illustrated with real data from a logical ability test. The results show how the incorporation of previous practice into the linear logistic model improves the fit of the model as well as the prediction of the Rasch item difficulty estimates. The model provides evidence of learning associated with two of the logic operations involved in the items, which supports the hypothesis of practice effects in deductive reasoning tasks.
{"title":"Bayesian Estimation and Testing of a Linear Logistic Test Model for Learning during the Test","authors":"J. H. Lozano, J. Revuelta","doi":"10.1080/08957347.2021.1933982","DOIUrl":"https://doi.org/10.1080/08957347.2021.1933982","url":null,"abstract":"ABSTRACT The present study proposes a Bayesian approach for estimating and testing the operation-specific learning model, a variant of the linear logistic test model that allows for the measurement of the learning that occurs during a test as a result of the repeated use of the operations involved in the items. The advantages of using a Bayesian framework compared to the traditional frequentist approach are discussed. The application of the model is illustrated with real data from a logical ability test. The results show how the incorporation of previous practice into the linear logistic model improves the fit of the model as well as the prediction of the Rasch item difficulty estimates. The model provides evidence of learning associated with two of the logic operations involved in the items, which supports the hypothesis of practice effects in deductive reasoning tasks.","PeriodicalId":51609,"journal":{"name":"Applied Measurement in Education","volume":"34 1","pages":"223 - 235"},"PeriodicalIF":1.5,"publicationDate":"2021-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/08957347.2021.1933982","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45671064","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-07-03DOI: 10.1080/08957347.2021.1933983
HyeSun Lee, W. Smith, Angel Martinez, Heather Ferris, Joe Bova
ABSTRACT The aim of the current research was to provide recommendations to facilitate the development and use of anchoring vignettes (AVs) for cross-cultural comparisons in education. Study 1 identified six factors leading to order violations and ties in AV responses based on cognitive interviews with 15-year-old students. The factors were categorized into three domains: varying levels of AV format familiarity, differential interpretations of content, and individual differences in processing information. To inform the most appropriate approach to treat order violations and re-scaling method, Study 2 conducted Monte Carlo simulations with the manipulation of three factors and incorporation of five responses styles. Study 2 found that the AV approach improved accuracy in score estimation by successfully controlling for response styles. The results also revealed the reordering approach to treat order violations produced the most accurate estimation combined with the re-scaling method assigning the middle value among possible scores. Along with strategies to develop AVs, strengths and limitations of the implemented nonparametric AV approach were discussed in comparison to item response theory modeling for response styles.
{"title":"Development and Use of Anchoring Vignettes: Psychometric Investigations and Recommendations for a Nonparametric Approach","authors":"HyeSun Lee, W. Smith, Angel Martinez, Heather Ferris, Joe Bova","doi":"10.1080/08957347.2021.1933983","DOIUrl":"https://doi.org/10.1080/08957347.2021.1933983","url":null,"abstract":"ABSTRACT The aim of the current research was to provide recommendations to facilitate the development and use of anchoring vignettes (AVs) for cross-cultural comparisons in education. Study 1 identified six factors leading to order violations and ties in AV responses based on cognitive interviews with 15-year-old students. The factors were categorized into three domains: varying levels of AV format familiarity, differential interpretations of content, and individual differences in processing information. To inform the most appropriate approach to treat order violations and re-scaling method, Study 2 conducted Monte Carlo simulations with the manipulation of three factors and incorporation of five responses styles. Study 2 found that the AV approach improved accuracy in score estimation by successfully controlling for response styles. The results also revealed the reordering approach to treat order violations produced the most accurate estimation combined with the re-scaling method assigning the middle value among possible scores. Along with strategies to develop AVs, strengths and limitations of the implemented nonparametric AV approach were discussed in comparison to item response theory modeling for response styles.","PeriodicalId":51609,"journal":{"name":"Applied Measurement in Education","volume":"34 1","pages":"204 - 222"},"PeriodicalIF":1.5,"publicationDate":"2021-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/08957347.2021.1933983","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43182394","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-06-15DOI: 10.1080/08957347.2021.1933981
Jonas Bjermo, Frank Miller
ABSTRACT In recent years, the interest in measuring growth in student ability in various subjects between different grades in school has increased. Therefore, good precision in the estimated growth is of importance. This paper aims to compare estimation methods and test designs when it comes to precision and bias of the estimated growth of mean ability between two groups of students that differ substantially. This is performed by a simulation study. One- and two-parameter item response models are assumed and the estimated abilities are vertically scaled using the non-equivalent anchor test design by estimating the abilities in one single run, so-called concurrent calibration. The connection between the test design and the Fisher information is also discussed. The results indicate that the expected a posteriori estimation method is preferred when estimating differences in mean ability between groups. Results also indicate that a test design with common items of medium difficulty leads to better precision, which coincides with previous results from horizontal equating.
{"title":"Efficient Estimation of Mean Ability Growth Using Vertical Scaling","authors":"Jonas Bjermo, Frank Miller","doi":"10.1080/08957347.2021.1933981","DOIUrl":"https://doi.org/10.1080/08957347.2021.1933981","url":null,"abstract":"ABSTRACT In recent years, the interest in measuring growth in student ability in various subjects between different grades in school has increased. Therefore, good precision in the estimated growth is of importance. This paper aims to compare estimation methods and test designs when it comes to precision and bias of the estimated growth of mean ability between two groups of students that differ substantially. This is performed by a simulation study. One- and two-parameter item response models are assumed and the estimated abilities are vertically scaled using the non-equivalent anchor test design by estimating the abilities in one single run, so-called concurrent calibration. The connection between the test design and the Fisher information is also discussed. The results indicate that the expected a posteriori estimation method is preferred when estimating differences in mean ability between groups. Results also indicate that a test design with common items of medium difficulty leads to better precision, which coincides with previous results from horizontal equating.","PeriodicalId":51609,"journal":{"name":"Applied Measurement in Education","volume":"34 1","pages":"163 - 178"},"PeriodicalIF":1.5,"publicationDate":"2021-06-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/08957347.2021.1933981","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43416092","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-04-03DOI: 10.1080/08957347.2021.1890743
Christine E. DeMars
ABSTRACT Estimation of parameters for the many-facets Rasch model requires that conditional on the values of the facets, such as person ability, item difficulty, and rater severity, the observed responses within each facet are independent. This requirement has often been discussed for the Rasch models and 2PL and 3PL models, but it becomes more complex with more facets. To show how violation of conditional independence may be exhibited, three scenarios with different types of dependency are developed: (a) raters rating the same work, (b) a residual ability shared by two tasks, and (c) score on one task dependent on observed score on a previous task.
{"title":"Violation of Conditional Independence in the Many-Facets Rasch Model","authors":"Christine E. DeMars","doi":"10.1080/08957347.2021.1890743","DOIUrl":"https://doi.org/10.1080/08957347.2021.1890743","url":null,"abstract":"ABSTRACT Estimation of parameters for the many-facets Rasch model requires that conditional on the values of the facets, such as person ability, item difficulty, and rater severity, the observed responses within each facet are independent. This requirement has often been discussed for the Rasch models and 2PL and 3PL models, but it becomes more complex with more facets. To show how violation of conditional independence may be exhibited, three scenarios with different types of dependency are developed: (a) raters rating the same work, (b) a residual ability shared by two tasks, and (c) score on one task dependent on observed score on a previous task.","PeriodicalId":51609,"journal":{"name":"Applied Measurement in Education","volume":"34 1","pages":"122 - 138"},"PeriodicalIF":1.5,"publicationDate":"2021-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/08957347.2021.1890743","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43821816","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-04-03DOI: 10.1080/08957347.2021.1890740
Rashid S. Almehrizi
ABSTRACT KR-21 reliability and its extension (coefficient α) gives the reliability estimate of test scores under the assumption of tau-equivalent forms. KR-21 reliability gives the reliability estimate for summed scores for dichotomous items when items are randomly sampled from an infinite pool of similar items (randomly parallel forms). The article extends KR-21 to coefficient to estimate reliability of summed scores for test with any item scoring pattern under the same assumption of KR-21. Also, the article presents coefficient to estimate this reliability for nonlinearly scaled scores for any type of items. The article shows that coefficient is slightly different from the classical index of reliability. Results show that coefficient is not equal to the classical index of reliability for summed scores and scaled scores. Moreover, results using real data on psychological instruments reveal that different score scales yield different coefficient reliability estimates that are less than coefficient Gα by a function related to form-to-form differences in averages.
{"title":"Coefficient β As Extension of KR-21 Reliability for Summed and Scaled Scores for Polytomously-scored Tests","authors":"Rashid S. Almehrizi","doi":"10.1080/08957347.2021.1890740","DOIUrl":"https://doi.org/10.1080/08957347.2021.1890740","url":null,"abstract":"ABSTRACT KR-21 reliability and its extension (coefficient α) gives the reliability estimate of test scores under the assumption of tau-equivalent forms. KR-21 reliability gives the reliability estimate for summed scores for dichotomous items when items are randomly sampled from an infinite pool of similar items (randomly parallel forms). The article extends KR-21 to coefficient to estimate reliability of summed scores for test with any item scoring pattern under the same assumption of KR-21. Also, the article presents coefficient to estimate this reliability for nonlinearly scaled scores for any type of items. The article shows that coefficient is slightly different from the classical index of reliability. Results show that coefficient is not equal to the classical index of reliability for summed scores and scaled scores. Moreover, results using real data on psychological instruments reveal that different score scales yield different coefficient reliability estimates that are less than coefficient Gα by a function related to form-to-form differences in averages.","PeriodicalId":51609,"journal":{"name":"Applied Measurement in Education","volume":"34 1","pages":"139 - 149"},"PeriodicalIF":1.5,"publicationDate":"2021-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44733498","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-03-08DOI: 10.1080/08957347.2021.1890744
Duy N. Pham, C. Wells, Malcolm I. Bauer, E. Wylie, S. Monroe
ABSTRACT Assessments built on a theory of learning progressions are promising formative tools to support learning and teaching. The quality and usefulness of those assessments depend, in large part, on the validity of the theory-informed inferences about student learning made from the assessment results. In this study, we introduced an approach to address an important challenge related to examining theorized level links across progressions. We adopted the method to analyze response data for three learning progressions: Equality and Variable, Functions and Linear Functions, and Proportional Reasoning in middle-school mathematics . Multidimensional item response theory models were fit to the data to evaluate the postulated learning levels and level links. Our findings supported the theoretical hypotheses for Functions and Linear Functions, and Proportional Reasoning. Thus, items measuring these progressions can be used to develop formative assessment tasks, and assist instructional practices. Our findings did not support the theory underlying Equality and Variable. Implications for assessment developers and users, and future directions for research were discussed.
{"title":"Examining Three Learning Progressions in Middle-school Mathematics for Formative Assessment","authors":"Duy N. Pham, C. Wells, Malcolm I. Bauer, E. Wylie, S. Monroe","doi":"10.1080/08957347.2021.1890744","DOIUrl":"https://doi.org/10.1080/08957347.2021.1890744","url":null,"abstract":"ABSTRACT Assessments built on a theory of learning progressions are promising formative tools to support learning and teaching. The quality and usefulness of those assessments depend, in large part, on the validity of the theory-informed inferences about student learning made from the assessment results. In this study, we introduced an approach to address an important challenge related to examining theorized level links across progressions. We adopted the method to analyze response data for three learning progressions: Equality and Variable, Functions and Linear Functions, and Proportional Reasoning in middle-school mathematics . Multidimensional item response theory models were fit to the data to evaluate the postulated learning levels and level links. Our findings supported the theoretical hypotheses for Functions and Linear Functions, and Proportional Reasoning. Thus, items measuring these progressions can be used to develop formative assessment tasks, and assist instructional practices. Our findings did not support the theory underlying Equality and Variable. Implications for assessment developers and users, and future directions for research were discussed.","PeriodicalId":51609,"journal":{"name":"Applied Measurement in Education","volume":"34 1","pages":"107 - 121"},"PeriodicalIF":1.5,"publicationDate":"2021-03-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/08957347.2021.1890744","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48413185","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-03-01DOI: 10.1080/08957347.2021.1890741
Joseph A. Rios
ABSTRACT Four decades of research have shown that students’ low test-taking effort is a serious threat to the validity of score-based inferences from low-stakes, group-based educational assessments. This meta-analysis sought to identify effective interventions for improving students’ test-taking effort in such contexts. Included studies (a) used a treatment-control group design; (b) administered a low-stakes group-based educational assessment; (c) employed an intervention to improve test-taking motivation; and (d) evaluated test-taking effort and/or test performance as outcomes. The analysis included 53 studies (N = 59,096) that produced 60 and 105 effect sizes of test-taking effort and test performance, respectively. On average, interventions were found to improve test-taking effort and test performance by 0.13 standard deviations (SD) each. The largest gains in test-taking effort were observed when providing external incentives followed by increasing test relevance, while no significant differences were found between these two intervention types in improving test performance. Furthermore, negligible impact was detected on both dependent variables for interventions that modified assessment design or promised feedback. Recommendations for future research and practice are discussed.
{"title":"Improving Test-Taking Effort in Low-Stakes Group-Based Educational Testing: A Meta-Analysis of Interventions","authors":"Joseph A. Rios","doi":"10.1080/08957347.2021.1890741","DOIUrl":"https://doi.org/10.1080/08957347.2021.1890741","url":null,"abstract":"ABSTRACT Four decades of research have shown that students’ low test-taking effort is a serious threat to the validity of score-based inferences from low-stakes, group-based educational assessments. This meta-analysis sought to identify effective interventions for improving students’ test-taking effort in such contexts. Included studies (a) used a treatment-control group design; (b) administered a low-stakes group-based educational assessment; (c) employed an intervention to improve test-taking motivation; and (d) evaluated test-taking effort and/or test performance as outcomes. The analysis included 53 studies (N = 59,096) that produced 60 and 105 effect sizes of test-taking effort and test performance, respectively. On average, interventions were found to improve test-taking effort and test performance by 0.13 standard deviations (SD) each. The largest gains in test-taking effort were observed when providing external incentives followed by increasing test relevance, while no significant differences were found between these two intervention types in improving test performance. Furthermore, negligible impact was detected on both dependent variables for interventions that modified assessment design or promised feedback. Recommendations for future research and practice are discussed.","PeriodicalId":51609,"journal":{"name":"Applied Measurement in Education","volume":"34 1","pages":"85 - 106"},"PeriodicalIF":1.5,"publicationDate":"2021-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/08957347.2021.1890741","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"59806077","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-03-01DOI: 10.1080/08957347.2021.1890745
S. Wise, Megan Kuhfeld
ABSTRACT Effort-moderated (E-M) scoring is intended to estimate how well a disengaged test taker would have performed had they been fully engaged. It accomplishes this adjustment by excluding disengaged responses from scoring and estimating performance from the remaining responses. The scoring method, however, assumes that the remaining responses are not affected by disengagement. A recent study provided evidence that the responses used in E-M scoring can sometimes reflect less-than-full, or partial engagement. In these instances, E-M scores will be less trustworthy because they can still be distorted by disengagement. This paper describes a method for identifying partial engagement and provides validation evidence to support its use and interpretation. When test events indicate the presence of partial engagement, effort-moderated scores should be interpreted cautiously.
{"title":"A Method for Identifying Partial Test-Taking Engagement","authors":"S. Wise, Megan Kuhfeld","doi":"10.1080/08957347.2021.1890745","DOIUrl":"https://doi.org/10.1080/08957347.2021.1890745","url":null,"abstract":"ABSTRACT Effort-moderated (E-M) scoring is intended to estimate how well a disengaged test taker would have performed had they been fully engaged. It accomplishes this adjustment by excluding disengaged responses from scoring and estimating performance from the remaining responses. The scoring method, however, assumes that the remaining responses are not affected by disengagement. A recent study provided evidence that the responses used in E-M scoring can sometimes reflect less-than-full, or partial engagement. In these instances, E-M scores will be less trustworthy because they can still be distorted by disengagement. This paper describes a method for identifying partial engagement and provides validation evidence to support its use and interpretation. When test events indicate the presence of partial engagement, effort-moderated scores should be interpreted cautiously.","PeriodicalId":51609,"journal":{"name":"Applied Measurement in Education","volume":"34 1","pages":"150 - 161"},"PeriodicalIF":1.5,"publicationDate":"2021-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/08957347.2021.1890745","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44049497","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}