Pub Date : 2021-09-29DOI: 10.1080/10627197.2021.1982693
M. Wolf, Hanwook Yoo, Danielle Guzman-Orth, J. Abedi
ABSTRACT Implementing a randomized controlled trial design, the present study investigated the effects of two types of accommodations, linguistic modification and a glossary, for English learners (ELs) taking a computer-based mathematics assessment. Process data including response time and clicks on glossary words were also examined to better interpret students’ interaction with the accommodations in the testing conditions. Regression and ANOVA analyses were performed with data from 513 students (189 ELs and 324 non-ELs) in Grade 9. No statistically significant accommodation effects were detected in this study. Process data revealed possible explanations (i.e., student engagement and glossary usage) for the nonsignificant results. Implications for future research on test accommodations for EL students are discussed.
{"title":"Investigating the Effects of Test Accommodations with Process Data for English Learners in a Mathematics Assessment","authors":"M. Wolf, Hanwook Yoo, Danielle Guzman-Orth, J. Abedi","doi":"10.1080/10627197.2021.1982693","DOIUrl":"https://doi.org/10.1080/10627197.2021.1982693","url":null,"abstract":"ABSTRACT Implementing a randomized controlled trial design, the present study investigated the effects of two types of accommodations, linguistic modification and a glossary, for English learners (ELs) taking a computer-based mathematics assessment. Process data including response time and clicks on glossary words were also examined to better interpret students’ interaction with the accommodations in the testing conditions. Regression and ANOVA analyses were performed with data from 513 students (189 ELs and 324 non-ELs) in Grade 9. No statistically significant accommodation effects were detected in this study. Process data revealed possible explanations (i.e., student engagement and glossary usage) for the nonsignificant results. Implications for future research on test accommodations for EL students are discussed.","PeriodicalId":46209,"journal":{"name":"Educational Assessment","volume":null,"pages":null},"PeriodicalIF":1.5,"publicationDate":"2021-09-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45298505","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-09-05DOI: 10.1080/10627197.2021.1971966
Leonora Kaldaras, Hope O. Akaeze, J. Krajcik
ABSTRACT Deep science understanding is reflected in students’ ability to use content and skills when making sense of the world. Assessing deep understanding requires measuring complex constructs that combine elements of content and skills. To develop valid measures of complex constructs, we need to understand how their theoretical dimensionality, reflected in the integration of content and skills, is manifested in practice. This work is developed in the context of the Framework for K-12 Science Education and Next-Generation Science Standards (NGSS). We introduce a methodology that describes steps for creating a theoretical validity argument for measuring complex NGSS constructs, designing operational assessments based on this argument, and obtaining empirical evidence for the validity of the argument and assessments, focusing on how theoretically suggested dimensionality of NGSS constructs is manifested in practice. Results have implications for developing valid NGSS assessments and reporting student progress on high-stakes and diagnostic evaluation.
{"title":"A Methodology for Determining and Validating Latent Factor Dimensionality of Complex Multi-Factor Science Constructs Measuring Knowledge-In-Use","authors":"Leonora Kaldaras, Hope O. Akaeze, J. Krajcik","doi":"10.1080/10627197.2021.1971966","DOIUrl":"https://doi.org/10.1080/10627197.2021.1971966","url":null,"abstract":"ABSTRACT Deep science understanding is reflected in students’ ability to use content and skills when making sense of the world. Assessing deep understanding requires measuring complex constructs that combine elements of content and skills. To develop valid measures of complex constructs, we need to understand how their theoretical dimensionality, reflected in the integration of content and skills, is manifested in practice. This work is developed in the context of the Framework for K-12 Science Education and Next-Generation Science Standards (NGSS). We introduce a methodology that describes steps for creating a theoretical validity argument for measuring complex NGSS constructs, designing operational assessments based on this argument, and obtaining empirical evidence for the validity of the argument and assessments, focusing on how theoretically suggested dimensionality of NGSS constructs is manifested in practice. Results have implications for developing valid NGSS assessments and reporting student progress on high-stakes and diagnostic evaluation.","PeriodicalId":46209,"journal":{"name":"Educational Assessment","volume":null,"pages":null},"PeriodicalIF":1.5,"publicationDate":"2021-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45735723","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-09-01DOI: 10.1080/10627197.2021.1966299
Jesse R. Sparks, P. V. van Rijn, P. Deane
ABSTRACT Effectively evaluating the credibility and accuracy of multiple sources is critical for college readiness. We developed 24 source evaluation tasks spanning four predicted difficulty levels of a hypothesized learning progression (LP) and piloted these tasks to evaluate the utility of an LP-based approach to designing formative literacy assessments. Sixth, seventh, and eighth grade students (N = 360, 120 per grade) completed 12 of the 24 tasks in an online testing session. Analyses examined the tasks’ reliability and validity and whether patterns of performance aligned to predicted LP levels (i.e., recovery of the LP) using task progression maps derived from item response theory (IRT). Results suggested that the LP tasks were reliable and correlated with external measures; however, some lower level tasks proved unexpectedly difficult. Possible explanations for low performance are discussed, followed by implications for future LP and task revisions. This work provides a model for designing and evaluating LP-based literacy assessments.
{"title":"Assessing Source Evaluation Skills of Middle School Students Using Learning Progressions","authors":"Jesse R. Sparks, P. V. van Rijn, P. Deane","doi":"10.1080/10627197.2021.1966299","DOIUrl":"https://doi.org/10.1080/10627197.2021.1966299","url":null,"abstract":"ABSTRACT Effectively evaluating the credibility and accuracy of multiple sources is critical for college readiness. We developed 24 source evaluation tasks spanning four predicted difficulty levels of a hypothesized learning progression (LP) and piloted these tasks to evaluate the utility of an LP-based approach to designing formative literacy assessments. Sixth, seventh, and eighth grade students (N = 360, 120 per grade) completed 12 of the 24 tasks in an online testing session. Analyses examined the tasks’ reliability and validity and whether patterns of performance aligned to predicted LP levels (i.e., recovery of the LP) using task progression maps derived from item response theory (IRT). Results suggested that the LP tasks were reliable and correlated with external measures; however, some lower level tasks proved unexpectedly difficult. Possible explanations for low performance are discussed, followed by implications for future LP and task revisions. This work provides a model for designing and evaluating LP-based literacy assessments.","PeriodicalId":46209,"journal":{"name":"Educational Assessment","volume":null,"pages":null},"PeriodicalIF":1.5,"publicationDate":"2021-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41628487","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-08-22DOI: 10.1080/10627197.2021.1965473
M. Russell, Olivia Szendey, Larry Kaplan
ABSTRACT Differential Item Function (DIF) analysis is commonly employed to examine potential bias produced by a test item. Since its introduction DIF analyses have focused on potential bias related to broad categories of oppression, including gender, racial stratification, economic class, and ableness. More recently, efforts to examine the effects of oppression on valued life-outcomes have employed an intersectional approach to more fully represent a person’s identity and capture the multiple, and often compound, impacts of oppression. The study presented here replicated an intersectional approach to DIF analyses to examine whether findings from a previous study that focused on a single grade-level achievement test generalized to other subject areas and grade levels. Findings indicate that the use of an intersectional approach is more sensitive to detecting potential item bias and that this increased sensitivity holds across the subject areas and grade levels examined.
{"title":"An Intersectional Approach to DIF: Do Initial Findings Hold across Tests?","authors":"M. Russell, Olivia Szendey, Larry Kaplan","doi":"10.1080/10627197.2021.1965473","DOIUrl":"https://doi.org/10.1080/10627197.2021.1965473","url":null,"abstract":"ABSTRACT Differential Item Function (DIF) analysis is commonly employed to examine potential bias produced by a test item. Since its introduction DIF analyses have focused on potential bias related to broad categories of oppression, including gender, racial stratification, economic class, and ableness. More recently, efforts to examine the effects of oppression on valued life-outcomes have employed an intersectional approach to more fully represent a person’s identity and capture the multiple, and often compound, impacts of oppression. The study presented here replicated an intersectional approach to DIF analyses to examine whether findings from a previous study that focused on a single grade-level achievement test generalized to other subject areas and grade levels. Findings indicate that the use of an intersectional approach is more sensitive to detecting potential item bias and that this increased sensitivity holds across the subject areas and grade levels examined.","PeriodicalId":46209,"journal":{"name":"Educational Assessment","volume":null,"pages":null},"PeriodicalIF":1.5,"publicationDate":"2021-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42202501","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-08-17DOI: 10.1080/10627197.2021.1962277
Stefanie A. Wind, Wenjing Guo
ABSTRACT Scoring procedures for the constructed-response (CR) items in large-scale mixed-format educational assessments often involve checks for rater agreement or rater reliability. Although these analyses are important, researchers have documented rater effects that persist despite rater training and that are not always detected in rater agreement and reliability analyses, such as severity/leniency, centrality/extremism, and biases. Left undetected, these effects pose threats to fairness. We illustrate how rater effects analyses can be incorporated into scoring procedures for large-scale mixed-format assessments. We used data from the National Assessment of Educational Progress (NAEP) to illustrate relatively simple analyses that can provide insight into patterns of rater judgment that may warrant additional attention. Our results suggested that the NAEP raters exhibited generally defensible psychometric properties, while also exhibiting some idiosyncrasies that could inform scoring procedures. Similar procedures could be used operationally to inform the interpretation and use of rater judgments in large-scale mixed-format assessments.
{"title":"Beyond Agreement: Exploring Rater Effects in Large-Scale Mixed Format Assessments","authors":"Stefanie A. Wind, Wenjing Guo","doi":"10.1080/10627197.2021.1962277","DOIUrl":"https://doi.org/10.1080/10627197.2021.1962277","url":null,"abstract":"ABSTRACT Scoring procedures for the constructed-response (CR) items in large-scale mixed-format educational assessments often involve checks for rater agreement or rater reliability. Although these analyses are important, researchers have documented rater effects that persist despite rater training and that are not always detected in rater agreement and reliability analyses, such as severity/leniency, centrality/extremism, and biases. Left undetected, these effects pose threats to fairness. We illustrate how rater effects analyses can be incorporated into scoring procedures for large-scale mixed-format assessments. We used data from the National Assessment of Educational Progress (NAEP) to illustrate relatively simple analyses that can provide insight into patterns of rater judgment that may warrant additional attention. Our results suggested that the NAEP raters exhibited generally defensible psychometric properties, while also exhibiting some idiosyncrasies that could inform scoring procedures. Similar procedures could be used operationally to inform the interpretation and use of rater judgments in large-scale mixed-format assessments.","PeriodicalId":46209,"journal":{"name":"Educational Assessment","volume":null,"pages":null},"PeriodicalIF":1.5,"publicationDate":"2021-08-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48562788","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-07-03DOI: 10.1080/10627197.2021.1917358
Vasiliki Pitsia, Anastasios Karakolidis, P. Lehane
ABSTRACT Evidence suggests that the quality of teachers’ instructional practices can be improved when these are informed by relevant assessment data. Drawing on a sample of 1,300 primary school teachers in Ireland, this study examined the extent to which teachers use standardized test results for instructional purposes as well as the role of several factors in predicting this use. Specifically, the study analyzed data from a cross-sectional survey that gathered information about teachers’ use of, experiences with, and attitudes toward assessment data from standardized tests. After taking other teacher and school characteristics into consideration, the analysis revealed that teachers with more positive attitudes toward standardized tests and those who were often engaged in some form of professional development on standardized testing tended to use assessment data to inform their teaching more frequently. Based on the findings, policy and practice implications are discussed.
{"title":"Investigating the Use of Assessment Data by Primary School Teachers: Insights from a Large-scale Survey in Ireland","authors":"Vasiliki Pitsia, Anastasios Karakolidis, P. Lehane","doi":"10.1080/10627197.2021.1917358","DOIUrl":"https://doi.org/10.1080/10627197.2021.1917358","url":null,"abstract":"ABSTRACT Evidence suggests that the quality of teachers’ instructional practices can be improved when these are informed by relevant assessment data. Drawing on a sample of 1,300 primary school teachers in Ireland, this study examined the extent to which teachers use standardized test results for instructional purposes as well as the role of several factors in predicting this use. Specifically, the study analyzed data from a cross-sectional survey that gathered information about teachers’ use of, experiences with, and attitudes toward assessment data from standardized tests. After taking other teacher and school characteristics into consideration, the analysis revealed that teachers with more positive attitudes toward standardized tests and those who were often engaged in some form of professional development on standardized testing tended to use assessment data to inform their teaching more frequently. Based on the findings, policy and practice implications are discussed.","PeriodicalId":46209,"journal":{"name":"Educational Assessment","volume":null,"pages":null},"PeriodicalIF":1.5,"publicationDate":"2021-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/10627197.2021.1917358","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42751021","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-07-03DOI: 10.1080/10627197.2021.1946390
T. Haladyna, Michael C. Rodriguez
ABSTRACT Full-information item analysis provides item developers and reviewers comprehensive empirical evidence of item quality, including option response frequency, point-biserial index (PBI) for distractors, mean-scores of respondents selecting each option, and option trace lines. The multi-serial index (MSI) is introduced as a more informative item-total correlation, accounting for variable distractor performance. The overall item PBI is empirically compared to the MSI. For items from an operational mathematics and reading test, poorly performing distractors are systematically removed to recompute the MSI, indicating improvements in item quality. Case studies for specific items with different characteristics are described to illustrate a variety of outcomes, focused on improving item discrimination. Full-information item analyses are presented for each case study item, providing clear examples of interpretation and use of item analyses. A summary of recommendations for item analysts is provided.
{"title":"Using Full-information Item Analysis to Improve Item Quality","authors":"T. Haladyna, Michael C. Rodriguez","doi":"10.1080/10627197.2021.1946390","DOIUrl":"https://doi.org/10.1080/10627197.2021.1946390","url":null,"abstract":"ABSTRACT Full-information item analysis provides item developers and reviewers comprehensive empirical evidence of item quality, including option response frequency, point-biserial index (PBI) for distractors, mean-scores of respondents selecting each option, and option trace lines. The multi-serial index (MSI) is introduced as a more informative item-total correlation, accounting for variable distractor performance. The overall item PBI is empirically compared to the MSI. For items from an operational mathematics and reading test, poorly performing distractors are systematically removed to recompute the MSI, indicating improvements in item quality. Case studies for specific items with different characteristics are described to illustrate a variety of outcomes, focused on improving item discrimination. Full-information item analyses are presented for each case study item, providing clear examples of interpretation and use of item analyses. A summary of recommendations for item analysts is provided.","PeriodicalId":46209,"journal":{"name":"Educational Assessment","volume":null,"pages":null},"PeriodicalIF":1.5,"publicationDate":"2021-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/10627197.2021.1946390","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42811776","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-07-03DOI: 10.1080/10627197.2021.1956897
S. Wise, Sukkeun Im, Jay Lee
ABSTRACT This study investigated test-taking engagement on the Spring 2019 administration of a large-scale state summative assessment. Through the identification of rapid-guessing behavior – which is a validated indicator of disengagement – the percentage of Grade 8 test events with meaningful amounts of rapid guessing was 5.5% in mathematics, 6.7% in English Language Arts (ELA), and 3.5% in science. Disengagement rates on the state summative test were also found to vary materially across gender, ethnicity, Individualized Educational Plan (IEP) status, Limited English Proficient (LEP) status, free and reduced lunch (FRL) status, and disability status. However, school mean performance, proficiency rates, and relative ranking were only minimally affected by disengagement. Overall, results of this study indicate that disengagement has a material impact on individual state summative test scores, though its impact on score aggregations may be relatively minor.
{"title":"The Impact of Disengaged Test Taking on a State’s Accountability Test Results","authors":"S. Wise, Sukkeun Im, Jay Lee","doi":"10.1080/10627197.2021.1956897","DOIUrl":"https://doi.org/10.1080/10627197.2021.1956897","url":null,"abstract":"ABSTRACT This study investigated test-taking engagement on the Spring 2019 administration of a large-scale state summative assessment. Through the identification of rapid-guessing behavior – which is a validated indicator of disengagement – the percentage of Grade 8 test events with meaningful amounts of rapid guessing was 5.5% in mathematics, 6.7% in English Language Arts (ELA), and 3.5% in science. Disengagement rates on the state summative test were also found to vary materially across gender, ethnicity, Individualized Educational Plan (IEP) status, Limited English Proficient (LEP) status, free and reduced lunch (FRL) status, and disability status. However, school mean performance, proficiency rates, and relative ranking were only minimally affected by disengagement. Overall, results of this study indicate that disengagement has a material impact on individual state summative test scores, though its impact on score aggregations may be relatively minor.","PeriodicalId":46209,"journal":{"name":"Educational Assessment","volume":null,"pages":null},"PeriodicalIF":1.5,"publicationDate":"2021-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/10627197.2021.1956897","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45119734","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-04-03DOI: 10.1080/10627197.2020.1858785
G. Krammer, Barbara Pflanzl, Gerlinde Lenske, Johannes Mayr
ABSTRACT Comparing teachers’ self-assessment to classes’ assessment of quality of teaching can offer insights for educational research and be a valuable resource for teachers’ continuous professional development. However, the quality of teaching needs to be measured in the same way across perspectives for this comparison to be meaningful. We used data from 622 teachers self-assessing aspects of quality of teaching and of their classes (12229 students) assessing the same aspects. Perspectives were compared with measurement invariance analyses. Teachers and classes agreed on the average level of instructional clarity, and disagreed over teacher-student relationship and performance monitoring, suggesting that mean differences across perspectives may not be as consistent as the literature claims. Results showed a nonuniform measurement bias for only one item of instructional clarity, while measurement of the other aspects was directly comparable. We conclude the viability of comparing teachers’ and classes’ perspectives of aspects of quality of teaching.
{"title":"Assessing Quality of Teaching from Different Perspectives: Measurement Invariance across Teachers and Classes","authors":"G. Krammer, Barbara Pflanzl, Gerlinde Lenske, Johannes Mayr","doi":"10.1080/10627197.2020.1858785","DOIUrl":"https://doi.org/10.1080/10627197.2020.1858785","url":null,"abstract":"ABSTRACT Comparing teachers’ self-assessment to classes’ assessment of quality of teaching can offer insights for educational research and be a valuable resource for teachers’ continuous professional development. However, the quality of teaching needs to be measured in the same way across perspectives for this comparison to be meaningful. We used data from 622 teachers self-assessing aspects of quality of teaching and of their classes (12229 students) assessing the same aspects. Perspectives were compared with measurement invariance analyses. Teachers and classes agreed on the average level of instructional clarity, and disagreed over teacher-student relationship and performance monitoring, suggesting that mean differences across perspectives may not be as consistent as the literature claims. Results showed a nonuniform measurement bias for only one item of instructional clarity, while measurement of the other aspects was directly comparable. We conclude the viability of comparing teachers’ and classes’ perspectives of aspects of quality of teaching.","PeriodicalId":46209,"journal":{"name":"Educational Assessment","volume":null,"pages":null},"PeriodicalIF":1.5,"publicationDate":"2021-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/10627197.2020.1858785","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42517403","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-02-06DOI: 10.1080/10627197.2022.2130748
M. Meeter, M. V. van Brederode
ABSTRACT The transition from secondary to tertiary education varies from country to country. In many countries, secondary school is concluded with high-stakes, national exams, or high-stakes entry tests are used for admissions to tertiary education. In other countries, secondary-school grade point average (GPA) is the determining factor. In the Netherlands, both play a role. With administrative data of close to 180,000 students, we investigated whether national exam scores or secondary school GPA was a better predictor of tertiary first-year retention. For both university education and higher professional education, secondary school GPA was the better prediction of retention, to the extent that national exams did not explain any additional variance. Moreover, for students who failed their exam, being held back by the secondary school for an additional year and entering tertiary education one year later, GPA in the year of failure remained as predictive as for students who had passed their exams and started tertiary education immediately. National exam scores, on the other hand, had no predictive value at all for these students. It is concluded that secondary school GPA measures aspects of student performance that is not included in high-stakes national exams, but that are predictive of subsequent success in tertiary education.
{"title":"Predicting Retention in Higher Education from high-stakes Exams or School GPA","authors":"M. Meeter, M. V. van Brederode","doi":"10.1080/10627197.2022.2130748","DOIUrl":"https://doi.org/10.1080/10627197.2022.2130748","url":null,"abstract":"ABSTRACT The transition from secondary to tertiary education varies from country to country. In many countries, secondary school is concluded with high-stakes, national exams, or high-stakes entry tests are used for admissions to tertiary education. In other countries, secondary-school grade point average (GPA) is the determining factor. In the Netherlands, both play a role. With administrative data of close to 180,000 students, we investigated whether national exam scores or secondary school GPA was a better predictor of tertiary first-year retention. For both university education and higher professional education, secondary school GPA was the better prediction of retention, to the extent that national exams did not explain any additional variance. Moreover, for students who failed their exam, being held back by the secondary school for an additional year and entering tertiary education one year later, GPA in the year of failure remained as predictive as for students who had passed their exams and started tertiary education immediately. National exam scores, on the other hand, had no predictive value at all for these students. It is concluded that secondary school GPA measures aspects of student performance that is not included in high-stakes national exams, but that are predictive of subsequent success in tertiary education.","PeriodicalId":46209,"journal":{"name":"Educational Assessment","volume":null,"pages":null},"PeriodicalIF":1.5,"publicationDate":"2021-02-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42962496","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}