The purpose of implementing an assessment and accountability program in an urban school district is to improve student learning of worthwhile content.1 Current levels of achievement in most U.S. urban districts are unacceptably low. Average achievement test results conceal the fact that achievement levels of students of color are substantially lower than those of white students. Improvements are urgently needed. Assessment and accountability, by themselves, are unlikely to turn around the low levels of student achievement in urban settings.2 Supports must be put in place so that students and schools can be successful. Such supports must be an integral part of an effective assessment and accountability program. Nevertheless, high-stakes testing can be a powerful policy lever in a more comprehensive reform initiative.3 Some education researchers and practitioners believe that high-stakes testing leads to a dumbed-down curriculum and unfair penalties for students and schools.4 Others believe equally strongly that, without high-stakes testing, many schools will continue to provide inadequate opportunities to learn for students, especially students from low-income families. We believe that a
{"title":"Building a High-Quality Assessment and Accountability Program: The Philadelphia Example","authors":"Andrew Porter, A. Chester","doi":"10.1353/PEP.2002.0016","DOIUrl":"https://doi.org/10.1353/PEP.2002.0016","url":null,"abstract":"The purpose of implementing an assessment and accountability program in an urban school district is to improve student learning of worthwhile content.1 Current levels of achievement in most U.S. urban districts are unacceptably low. Average achievement test results conceal the fact that achievement levels of students of color are substantially lower than those of white students. Improvements are urgently needed. Assessment and accountability, by themselves, are unlikely to turn around the low levels of student achievement in urban settings.2 Supports must be put in place so that students and schools can be successful. Such supports must be an integral part of an effective assessment and accountability program. Nevertheless, high-stakes testing can be a powerful policy lever in a more comprehensive reform initiative.3 Some education researchers and practitioners believe that high-stakes testing leads to a dumbed-down curriculum and unfair penalties for students and schools.4 Others believe equally strongly that, without high-stakes testing, many schools will continue to provide inadequate opportunities to learn for students, especially students from low-income families. We believe that a","PeriodicalId":9272,"journal":{"name":"Brookings Papers on Education Policy","volume":"55 1","pages":"285 - 337"},"PeriodicalIF":0.0,"publicationDate":"2002-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91274936","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
B y the spring of 2000, forty states had begun using student test scores to rate school performance. Twenty states have gone a step further and are attaching explicit monetary rewards or sanctions to a school's test performance. For example, California planned to spend $677 million on teacher incentives in 2001, providing bonuses of up to $25,000 to teachers in schools with the largest test score gains. We highlight an under-appreciated weakness of school accountability systems—the volatility of test score measures—and explore the implications of that volatility for the design of school accountability systems. The imprecision of test score measures arises from two sources. The first is sampling variation, which is a particularly striking problem in elementary schools. With the average elementary school containing only sixty-eight students per grade level, the amount of variation stemming from the idiosyncrasies of the particular sample of students being tested is often large relative to the total amount of variation observed between schools. The second arises from one-time factors that are not sensitive to the size of the sample; for example, a dog barking in the playground on the day of the test, a severe flu season, a disruptive student in a class, or favorable chemistry between a group of students and their teacher. Both small samples and other one-time factors can add considerable volatility to test score measures.
{"title":"Volatility in School Test Scores: Implications for Test-Based Accountability Systems","authors":"Thomas J. Kane, D. Staiger","doi":"10.1353/PEP.2002.0010","DOIUrl":"https://doi.org/10.1353/PEP.2002.0010","url":null,"abstract":"B y the spring of 2000, forty states had begun using student test scores to rate school performance. Twenty states have gone a step further and are attaching explicit monetary rewards or sanctions to a school's test performance. For example, California planned to spend $677 million on teacher incentives in 2001, providing bonuses of up to $25,000 to teachers in schools with the largest test score gains. We highlight an under-appreciated weakness of school accountability systems—the volatility of test score measures—and explore the implications of that volatility for the design of school accountability systems. The imprecision of test score measures arises from two sources. The first is sampling variation, which is a particularly striking problem in elementary schools. With the average elementary school containing only sixty-eight students per grade level, the amount of variation stemming from the idiosyncrasies of the particular sample of students being tested is often large relative to the total amount of variation observed between schools. The second arises from one-time factors that are not sensitive to the size of the sample; for example, a dog barking in the playground on the day of the test, a severe flu season, a disruptive student in a class, or favorable chemistry between a group of students and their teacher. Both small samples and other one-time factors can add considerable volatility to test score measures.","PeriodicalId":9272,"journal":{"name":"Brookings Papers on Education Policy","volume":"22 1","pages":"235 - 283"},"PeriodicalIF":0.0,"publicationDate":"2002-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79091682","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper provides an analysis of the controversy surround ing the stan ard setting process conducted by ACT Inc. for the National Assessment Governing Board (NAGB).1 This process is the most thoroughly planned, carefully executed, exhaustively evaluated, completely documented, and most visible of any standard setting process of which I am aware. Extensive research was conducted to determine how best to develop each step in the process.2 A distinguished team of experts guided the process through its development and implementation.3 And, the process has been open to scrutiny with evaluators observing the design and implementation of every step. Any process can be improved with experience and with continuing research and development. Better methods for setting standards likely will be created in the future. Until such developments occur, however, this process?called the achievement levels setting (ALS) process by NAGB?is the model for how standard setting should be done. The question I attempt to answer here is: If the standard setting process is of such high quality, why are the standards set by the process so controversial? Although I think extremely well of the NAGB standard setting process, interpreting the results of the ALS process is a very complex undertaking. A difference has become evident between the technical accuracy of the stan dards and the clarity of meaning for the standards that were set. The techni cal quality of the standards is very high. Statistical analyses have shown that the standards are well within the accepted bounds for amount of error in the estimated cutscores, and follow-up validity studies have provided supportive 231
{"title":"The Controversy over the National Assessment Governing Board Standards","authors":"M. Reckase","doi":"10.1353/PEP.2001.0014","DOIUrl":"https://doi.org/10.1353/PEP.2001.0014","url":null,"abstract":"This paper provides an analysis of the controversy surround ing the stan ard setting process conducted by ACT Inc. for the National Assessment Governing Board (NAGB).1 This process is the most thoroughly planned, carefully executed, exhaustively evaluated, completely documented, and most visible of any standard setting process of which I am aware. Extensive research was conducted to determine how best to develop each step in the process.2 A distinguished team of experts guided the process through its development and implementation.3 And, the process has been open to scrutiny with evaluators observing the design and implementation of every step. Any process can be improved with experience and with continuing research and development. Better methods for setting standards likely will be created in the future. Until such developments occur, however, this process?called the achievement levels setting (ALS) process by NAGB?is the model for how standard setting should be done. The question I attempt to answer here is: If the standard setting process is of such high quality, why are the standards set by the process so controversial? Although I think extremely well of the NAGB standard setting process, interpreting the results of the ALS process is a very complex undertaking. A difference has become evident between the technical accuracy of the stan dards and the clarity of meaning for the standards that were set. The techni cal quality of the standards is very high. Statistical analyses have shown that the standards are well within the accepted bounds for amount of error in the estimated cutscores, and follow-up validity studies have provided supportive 231","PeriodicalId":9272,"journal":{"name":"Brookings Papers on Education Policy","volume":"167 1","pages":"231 - 265"},"PeriodicalIF":0.0,"publicationDate":"2001-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84519734","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Searching for Indirect Evidence for the Effects of Statewide Reforms","authors":"D. Grissmer, A. Flanagan","doi":"10.1353/PEP.2001.0007","DOIUrl":"https://doi.org/10.1353/PEP.2001.0007","url":null,"abstract":"","PeriodicalId":9272,"journal":{"name":"Brookings Papers on Education Policy","volume":"3 1","pages":"181 - 229"},"PeriodicalIF":0.0,"publicationDate":"2001-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89685425","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
J. Bishop, F. Mañé, Michael Bishop, Joan Y. Moriarty
Excerpt] Educational reformers and most of the American public believe that most teachers ask too little of their pupils. These low expectations, they believe, result in watered down curricula and a tolerance of mediocre teaching and inappropriate student behavior. The result is that the prophecy of low achievement becomes self-fulfilling. Although research has shown that learning gains are substantially larger when students take more demanding courses2, only a minority of students enroll in these courses. There are several reasons for this. Guidance counselors in many schools allow only a select few into the most challenging courses. While most schools give students and parents the authority to overturn counselor recommendations, many families are unaware they have that power or are intimidated by the counselor’s prediction of failure in the tougher class. As one student put it: “African-American parents, they settle for less, not knowing they can get more for their students.”
{"title":"The Role of End-of-Course Exams and Minimum Competency Exams in Standards-Based Reforms","authors":"J. Bishop, F. Mañé, Michael Bishop, Joan Y. Moriarty","doi":"10.1353/PEP.2001.0002","DOIUrl":"https://doi.org/10.1353/PEP.2001.0002","url":null,"abstract":"Excerpt] Educational reformers and most of the American public believe that most teachers ask too little of their pupils. These low expectations, they believe, result in watered down curricula and a tolerance of mediocre teaching and inappropriate student behavior. The result is that the prophecy of low achievement becomes self-fulfilling. Although research has shown that learning gains are substantially larger when students take more demanding courses2, only a minority of students enroll in these courses. There are several reasons for this. Guidance counselors in many schools allow only a select few into the most challenging courses. While most schools give students and parents the authority to overturn counselor recommendations, many families are unaware they have that power or are intimidated by the counselor’s prediction of failure in the tougher class. As one student put it: “African-American parents, they settle for less, not knowing they can get more for their students.”","PeriodicalId":9272,"journal":{"name":"Brookings Papers on Education Policy","volume":"2012 1","pages":"267 - 345"},"PeriodicalIF":0.0,"publicationDate":"2001-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82644161","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}