Pub Date : 2024-02-02DOI: 10.1080/08957347.2024.2311927
Sarah Alahmadi, Christine E. DeMars
Large-scale educational assessments are sometimes considered low-stakes, increasing the possibility of confounding true performance level with low motivation. These concerns are amplified in remote...
{"title":"Comparing Examinee-Based and Response-Based Motivation Filtering Methods in Remote Low-Stakes Testing","authors":"Sarah Alahmadi, Christine E. DeMars","doi":"10.1080/08957347.2024.2311927","DOIUrl":"https://doi.org/10.1080/08957347.2024.2311927","url":null,"abstract":"Large-scale educational assessments are sometimes considered low-stakes, increasing the possibility of confounding true performance level with low motivation. These concerns are amplified in remote...","PeriodicalId":51609,"journal":{"name":"Applied Measurement in Education","volume":"35 1","pages":""},"PeriodicalIF":1.5,"publicationDate":"2024-02-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139772392","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-12-06DOI: 10.1080/08957347.2023.2274573
Walter P. Vispoel, Hyeri Hong, Hyeryung Lee, Terrence D. Jorgensen
We illustrate how to analyze complete generalizability theory (GT) designs using structural equation modeling software (lavaan in R), compare results to those obtained from numerous ANOVA-based pac...
{"title":"Analyzing Complete Generalizability Theory Designs Using Structural Equation Models","authors":"Walter P. Vispoel, Hyeri Hong, Hyeryung Lee, Terrence D. Jorgensen","doi":"10.1080/08957347.2023.2274573","DOIUrl":"https://doi.org/10.1080/08957347.2023.2274573","url":null,"abstract":"We illustrate how to analyze complete generalizability theory (GT) designs using structural equation modeling software (lavaan in R), compare results to those obtained from numerous ANOVA-based pac...","PeriodicalId":51609,"journal":{"name":"Applied Measurement in Education","volume":"241 1","pages":""},"PeriodicalIF":1.5,"publicationDate":"2023-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138547906","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-11-19DOI: 10.1080/08957347.2023.2274570
Tony Albano
Published in Applied Measurement in Education (Ahead of Print, 2023)
发表于《教育中的应用测量》(2023年出版前版)
{"title":"Validity: An Integrated Approach to Test Score Meaning and Use, by Gregory J. Cizek, New York, Routledge, 2020, 190 pp., 55.00 (Paperback)","authors":"Tony Albano","doi":"10.1080/08957347.2023.2274570","DOIUrl":"https://doi.org/10.1080/08957347.2023.2274570","url":null,"abstract":"Published in Applied Measurement in Education (Ahead of Print, 2023)","PeriodicalId":51609,"journal":{"name":"Applied Measurement in Education","volume":"64 4","pages":""},"PeriodicalIF":1.5,"publicationDate":"2023-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138495010","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-11-16DOI: 10.1080/08957347.2023.2274565
Jennifer Randall, Joseph Rios
Building on the extant literature on recruitment and retention within the field of STEM and undergraduate education, we sought to explore the recruitment and retention experiences of racially and e...
{"title":"Recruitment and Retention of Racially and Ethnically Minoritized Graduate Students in Educational Measurement Programs","authors":"Jennifer Randall, Joseph Rios","doi":"10.1080/08957347.2023.2274565","DOIUrl":"https://doi.org/10.1080/08957347.2023.2274565","url":null,"abstract":"Building on the extant literature on recruitment and retention within the field of STEM and undergraduate education, we sought to explore the recruitment and retention experiences of racially and e...","PeriodicalId":51609,"journal":{"name":"Applied Measurement in Education","volume":"64 5","pages":""},"PeriodicalIF":1.5,"publicationDate":"2023-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138495009","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-11-08DOI: 10.1080/08957347.2023.2274567
Daniel Jurich, Chunyan Liu
ABSTRACTScreening items for parameter drift helps protect against serious validity threats and ensure score comparability when equating forms. Although many high-stakes credentialing examinations operate with small sample sizes, few studies have investigated methods to detect drift in small sample equating. This study demonstrates that several newly researched drift detection strategies can improve equating accuracy under certain conditions with small samples where some anchor items display item parameter drift. Results showed that the recently proposed methods mINFIT and mOUTFIT as well as the more conventional Robust-z helped mitigate the adverse effects of drifting anchor items in conditions with higher drift levels or with more than 75 examinees. In contrast, the Logit Difference approach excessively removed invariant anchor items. The discussion provides recommendations on how practitioners working with small samples can use the results to make more informed decisions regarding item parameter drift. Disclosure statementNo potential conflict of interest was reported by the author(s).Supplementary materialSupplemental data for this article can be accessed online at https://doi.org/10.1080/08957347.2023.2274567Notes1 In certain testing designs, some items may be reused as non-anchor items on future forms. Although IPD can occur on those items, we use the traditional IPD definition as specific to differential functioning in the items reused to serve as the equating anchor set.2 In IRT, the old form anchor item parameter estimates can also come from a pre-calibrated bank. However, we use the old and new form terminology as the simulation design involves directly equating to a previous form.3 For example, assume an item drifted in the 1.0 magnitude condition from b = 0 to 1 between Forms 1 and 2, this item would be treated as having a true b of 1.0 if selected on the Form 3.
{"title":"Detecting Item Parameter Drift in Small Sample Rasch Equating","authors":"Daniel Jurich, Chunyan Liu","doi":"10.1080/08957347.2023.2274567","DOIUrl":"https://doi.org/10.1080/08957347.2023.2274567","url":null,"abstract":"ABSTRACTScreening items for parameter drift helps protect against serious validity threats and ensure score comparability when equating forms. Although many high-stakes credentialing examinations operate with small sample sizes, few studies have investigated methods to detect drift in small sample equating. This study demonstrates that several newly researched drift detection strategies can improve equating accuracy under certain conditions with small samples where some anchor items display item parameter drift. Results showed that the recently proposed methods mINFIT and mOUTFIT as well as the more conventional Robust-z helped mitigate the adverse effects of drifting anchor items in conditions with higher drift levels or with more than 75 examinees. In contrast, the Logit Difference approach excessively removed invariant anchor items. The discussion provides recommendations on how practitioners working with small samples can use the results to make more informed decisions regarding item parameter drift. Disclosure statementNo potential conflict of interest was reported by the author(s).Supplementary materialSupplemental data for this article can be accessed online at https://doi.org/10.1080/08957347.2023.2274567Notes1 In certain testing designs, some items may be reused as non-anchor items on future forms. Although IPD can occur on those items, we use the traditional IPD definition as specific to differential functioning in the items reused to serve as the equating anchor set.2 In IRT, the old form anchor item parameter estimates can also come from a pre-calibrated bank. However, we use the old and new form terminology as the simulation design involves directly equating to a previous form.3 For example, assume an item drifted in the 1.0 magnitude condition from b = 0 to 1 between Forms 1 and 2, this item would be treated as having a true b of 1.0 if selected on the Form 3.","PeriodicalId":51609,"journal":{"name":"Applied Measurement in Education","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-11-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135392677","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-11-08DOI: 10.1080/08957347.2023.2274572
TsungHan Ho
ABSTRACTAn operational multistage adaptive test (MST) requires the development of a large item bank and the effort to continuously replenish the item bank due to concerns about test security and validity over the long term. New items should be pretested and linked to the item bank before being used operationally. The linking item volume fluctuations in MST, however, bring into question the quality of the link to the reference scale. In this study, various calibration/linking methods along with a newly proposed Bayesian logistic regression (BLR) method were evaluated by comparison with the test characteristic curve method through simulated MST response data in terms of item parameter recovery. Results generated by the BLR method were promising due to its estimation stability and robustness across studied conditions. The findings of the present study should help inform practitioners of the utilities of implementing the pretest item calibration method in MST. Disclosure statementNo potential conflict of interest was reported by the author(s).
{"title":"Bayesian Logistic Regression: A New Method to Calibrate Pretest Items in Multistage Adaptive Testing","authors":"TsungHan Ho","doi":"10.1080/08957347.2023.2274572","DOIUrl":"https://doi.org/10.1080/08957347.2023.2274572","url":null,"abstract":"ABSTRACTAn operational multistage adaptive test (MST) requires the development of a large item bank and the effort to continuously replenish the item bank due to concerns about test security and validity over the long term. New items should be pretested and linked to the item bank before being used operationally. The linking item volume fluctuations in MST, however, bring into question the quality of the link to the reference scale. In this study, various calibration/linking methods along with a newly proposed Bayesian logistic regression (BLR) method were evaluated by comparison with the test characteristic curve method through simulated MST response data in terms of item parameter recovery. Results generated by the BLR method were promising due to its estimation stability and robustness across studied conditions. The findings of the present study should help inform practitioners of the utilities of implementing the pretest item calibration method in MST. Disclosure statementNo potential conflict of interest was reported by the author(s).","PeriodicalId":51609,"journal":{"name":"Applied Measurement in Education","volume":"134 4‐6","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-11-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135392506","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-11-06DOI: 10.1080/08957347.2023.2274568
Steven L. Wise, G. Gage Kingsbury, Meredith L. Langi
ABSTRACTRecent research has provided evidence that performance change during a student’s test event can indicate the presence of test-taking disengagement. Meaningful performance change implies that some portions of the test event reflect assumed maximum performance better than others and, because disengagement tends to diminish performance, lower-performing portions are less likely to reflect maximum performance than higher-performing portions. This empirical study explored the use of differential weighting of item responses during scoring, with weighting schemes representing either declining or increasing performance. Results indicated that weighted scoring could substantially decrease the score distortion due to disengagement factors and thereby improve test score validity. The study findings support the use of scoring procedures that manage disengagement by adapting to student test-taking behavior. Disclosure statementThe authors have no known conflicts of interest to disclose.Notes1 What constitutes “construct-irrelevant” depends on how the target construct is conceptualized. For example, Borgonovi and Biecek (Citation2016) argued that academic endurance should be considered part of what PISA is intended to measure, because academic endurance is positively associated with a student’s success later in life. It is unclear, however, how universally this conceptualization is adopted by those interpreting PISA results.2 Such comparisons between first and second half test performance require the assumption that the two halves are reasonably equivalent in terms of content representation if IRT-based scoring is used.3 Half test MLE standard errors in Math and Reading were around 4.2 and 4.8, respectively.4 These intervals are not intended to correspond to the critical regions used to assess statistical significance under the AMC method. For example, classifying PD < -10 points as a large decline represents a less conservative criterion than the critical region used by Wise and Kingsbury (Citation2022).
{"title":"Change in Engagement During Test Events: An Argument for Weighted Scoring?","authors":"Steven L. Wise, G. Gage Kingsbury, Meredith L. Langi","doi":"10.1080/08957347.2023.2274568","DOIUrl":"https://doi.org/10.1080/08957347.2023.2274568","url":null,"abstract":"ABSTRACTRecent research has provided evidence that performance change during a student’s test event can indicate the presence of test-taking disengagement. Meaningful performance change implies that some portions of the test event reflect assumed maximum performance better than others and, because disengagement tends to diminish performance, lower-performing portions are less likely to reflect maximum performance than higher-performing portions. This empirical study explored the use of differential weighting of item responses during scoring, with weighting schemes representing either declining or increasing performance. Results indicated that weighted scoring could substantially decrease the score distortion due to disengagement factors and thereby improve test score validity. The study findings support the use of scoring procedures that manage disengagement by adapting to student test-taking behavior. Disclosure statementThe authors have no known conflicts of interest to disclose.Notes1 What constitutes “construct-irrelevant” depends on how the target construct is conceptualized. For example, Borgonovi and Biecek (Citation2016) argued that academic endurance should be considered part of what PISA is intended to measure, because academic endurance is positively associated with a student’s success later in life. It is unclear, however, how universally this conceptualization is adopted by those interpreting PISA results.2 Such comparisons between first and second half test performance require the assumption that the two halves are reasonably equivalent in terms of content representation if IRT-based scoring is used.3 Half test MLE standard errors in Math and Reading were around 4.2 and 4.8, respectively.4 These intervals are not intended to correspond to the critical regions used to assess statistical significance under the AMC method. For example, classifying PD < -10 points as a large decline represents a less conservative criterion than the critical region used by Wise and Kingsbury (Citation2022).","PeriodicalId":51609,"journal":{"name":"Applied Measurement in Education","volume":"21 3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135634751","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-06-08DOI: 10.1080/08957347.2023.2222031
Pōhai Kūkea Shultz, Kerry S. Englert
ABSTRACT In the United States, systemic racism against people of color was brought to the forefront of discourse throughout 2020, and highlighted the on-going inequities faced by intentionally marginalized groups in policing, health and education. No community of color is immune from these inequities, and the activism in 2020 and the consequences of the pandemic have made systemic inequities impossible to ignore. In the Hawaiʻi context, social and racial injustice has resulted in cultural and language loss (among other markers of colonization), but it is within this loss that we can see the potential for the most significant evolution of assessment practices that champion self determination and social justice. We illustrate how injustices can be addressed through the development of assessments centered in advocacy of and accountability to our communities of color. It is time for us to reimagine what self-determination and social justice in all assessment systems can and should look like.
{"title":"The Promise of Assessments That Advance Social Justice: An Indigenous Example","authors":"Pōhai Kūkea Shultz, Kerry S. Englert","doi":"10.1080/08957347.2023.2222031","DOIUrl":"https://doi.org/10.1080/08957347.2023.2222031","url":null,"abstract":"ABSTRACT In the United States, systemic racism against people of color was brought to the forefront of discourse throughout 2020, and highlighted the on-going inequities faced by intentionally marginalized groups in policing, health and education. No community of color is immune from these inequities, and the activism in 2020 and the consequences of the pandemic have made systemic inequities impossible to ignore. In the Hawaiʻi context, social and racial injustice has resulted in cultural and language loss (among other markers of colonization), but it is within this loss that we can see the potential for the most significant evolution of assessment practices that champion self determination and social justice. We illustrate how injustices can be addressed through the development of assessments centered in advocacy of and accountability to our communities of color. It is time for us to reimagine what self-determination and social justice in all assessment systems can and should look like.","PeriodicalId":51609,"journal":{"name":"Applied Measurement in Education","volume":"36 1","pages":"255 - 268"},"PeriodicalIF":1.5,"publicationDate":"2023-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45164648","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-05-31DOI: 10.1080/08957347.2023.2214656
Mya Poe, M. Oliveri, N. Elliot
ABSTRACT Since 1952, the Standards for Educational and Psychological Testing has provided criteria for developing and evaluating educational and psychological tests and testing practice. Yet, we argue that the foundations, operations, and applications in the Standards are no longer sufficient to meet the current U.S. testing demands for fairness for all test takers. We propose racial justice extensions as principled ways to extend the Standards, through intentional actions focused on race and targeted at educational policies, processes, and outcomes in specific settings. To inform these extensions, we focus on four social-justice concepts: intersectionality derived from Black Feminist Theory; responsibility derived from moral philosophy; disparate impact derived from legal reasoning; and situatedness derived from social learning theories. We demonstrate these extensions and concepts in action by applying them to case studies of nursing licensure and placement testing.
{"title":"The Standards Will Never Be Enough: A Racial Justice Extension","authors":"Mya Poe, M. Oliveri, N. Elliot","doi":"10.1080/08957347.2023.2214656","DOIUrl":"https://doi.org/10.1080/08957347.2023.2214656","url":null,"abstract":"ABSTRACT Since 1952, the Standards for Educational and Psychological Testing has provided criteria for developing and evaluating educational and psychological tests and testing practice. Yet, we argue that the foundations, operations, and applications in the Standards are no longer sufficient to meet the current U.S. testing demands for fairness for all test takers. We propose racial justice extensions as principled ways to extend the Standards, through intentional actions focused on race and targeted at educational policies, processes, and outcomes in specific settings. To inform these extensions, we focus on four social-justice concepts: intersectionality derived from Black Feminist Theory; responsibility derived from moral philosophy; disparate impact derived from legal reasoning; and situatedness derived from social learning theories. We demonstrate these extensions and concepts in action by applying them to case studies of nursing licensure and placement testing.","PeriodicalId":51609,"journal":{"name":"Applied Measurement in Education","volume":"36 1","pages":"193 - 215"},"PeriodicalIF":1.5,"publicationDate":"2023-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44476682","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-05-27DOI: 10.1080/08957347.2023.2217555
Michaeline Russell
ABSTRACT In recent years, issues of race, racism and social justice have garnered increased attention across the nation. Although some aspects of social justice, particularly cultural sensitivity and test bias, have received similar attention within the field of educational measurement, sharp focus of racism has alluded the field. This manuscript focuses narrowly on racism. Drawing on an expansive body of work in the field of sociology, several key theories of race and racism advanced over the past century are presented. Elements of these theories are then integrated into a model of systemic racism. This model is used to identify some of the ways in which educational measurement supports systemic racism as it operates in the United States. I then explore ways in which an anti-racist frame could be applied to combat the system of racism and reorient our work to support racial liberation.
{"title":"Shifting Educational Measurement from an Agent of Systemic Racism to an Anti-Racist Endeavor","authors":"Michaeline Russell","doi":"10.1080/08957347.2023.2217555","DOIUrl":"https://doi.org/10.1080/08957347.2023.2217555","url":null,"abstract":"ABSTRACT In recent years, issues of race, racism and social justice have garnered increased attention across the nation. Although some aspects of social justice, particularly cultural sensitivity and test bias, have received similar attention within the field of educational measurement, sharp focus of racism has alluded the field. This manuscript focuses narrowly on racism. Drawing on an expansive body of work in the field of sociology, several key theories of race and racism advanced over the past century are presented. Elements of these theories are then integrated into a model of systemic racism. This model is used to identify some of the ways in which educational measurement supports systemic racism as it operates in the United States. I then explore ways in which an anti-racist frame could be applied to combat the system of racism and reorient our work to support racial liberation.","PeriodicalId":51609,"journal":{"name":"Applied Measurement in Education","volume":"36 1","pages":"216 - 241"},"PeriodicalIF":1.5,"publicationDate":"2023-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47875297","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}