Volatility in School Test Scores: Implications for Test-Based Accountability Systems

Brookings Papers on Education Policy Pub Date : 2002-01-01 DOI:10.1353/PEP.2002.0010

Thomas J. Kane, D. Staiger

{"title":"Volatility in School Test Scores: Implications for Test-Based Accountability Systems","authors":"Thomas J. Kane, D. Staiger","doi":"10.1353/PEP.2002.0010","DOIUrl":null,"url":null,"abstract":"B y the spring of 2000, forty states had begun using student test scores to rate school performance. Twenty states have gone a step further and are attaching explicit monetary rewards or sanctions to a school's test performance. For example, California planned to spend $677 million on teacher incentives in 2001, providing bonuses of up to $25,000 to teachers in schools with the largest test score gains. We highlight an under-appreciated weakness of school accountability systems—the volatility of test score measures—and explore the implications of that volatility for the design of school accountability systems. The imprecision of test score measures arises from two sources. The first is sampling variation, which is a particularly striking problem in elementary schools. With the average elementary school containing only sixty-eight students per grade level, the amount of variation stemming from the idiosyncrasies of the particular sample of students being tested is often large relative to the total amount of variation observed between schools. The second arises from one-time factors that are not sensitive to the size of the sample; for example, a dog barking in the playground on the day of the test, a severe flu season, a disruptive student in a class, or favorable chemistry between a group of students and their teacher. Both small samples and other one-time factors can add considerable volatility to test score measures.","PeriodicalId":9272,"journal":{"name":"Brookings Papers on Education Policy","volume":"22 1","pages":"235 - 283"},"PeriodicalIF":0.0000,"publicationDate":"2002-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"275","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Brookings Papers on Education Policy","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1353/PEP.2002.0010","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 275

Abstract

B y the spring of 2000, forty states had begun using student test scores to rate school performance. Twenty states have gone a step further and are attaching explicit monetary rewards or sanctions to a school's test performance. For example, California planned to spend $677 million on teacher incentives in 2001, providing bonuses of up to $25,000 to teachers in schools with the largest test score gains. We highlight an under-appreciated weakness of school accountability systems—the volatility of test score measures—and explore the implications of that volatility for the design of school accountability systems. The imprecision of test score measures arises from two sources. The first is sampling variation, which is a particularly striking problem in elementary schools. With the average elementary school containing only sixty-eight students per grade level, the amount of variation stemming from the idiosyncrasies of the particular sample of students being tested is often large relative to the total amount of variation observed between schools. The second arises from one-time factors that are not sensitive to the size of the sample; for example, a dog barking in the playground on the day of the test, a severe flu season, a disruptive student in a class, or favorable chemistry between a group of students and their teacher. Both small samples and other one-time factors can add considerable volatility to test score measures.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

学校考试成绩的波动:对基于考试的问责制的影响

到2000年春天，已经有40个州开始使用学生的考试成绩来评价学校的表现。20个州更进一步，对学校的考试成绩进行明确的金钱奖励或制裁。例如，加州计划在2001年花费6.77亿美元用于教师激励，为考试成绩最高的学校的教师提供高达2.5万美元的奖金。我们强调了学校问责制的一个未被充分认识的弱点——考试成绩衡量的波动性——并探讨了这种波动性对学校问责制设计的影响。考试成绩衡量的不精确性来自两个方面。首先是抽样变异，这在小学是一个特别突出的问题。由于一所小学平均每个年级只有68名学生，因此，与学校之间观察到的总体差异相比，由被测试学生的特定样本的特质引起的差异量往往很大。第二种源于对样本大小不敏感的一次性因素;例如，考试当天操场上的狗在叫，严重的流感季节，课堂上捣乱的学生，或者一群学生和老师之间良好的化学反应。小样本和其他一次性因素都会增加测试分数测量的波动性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Brookings Papers on Education Policy

自引率

0.00%

发文量

期刊最新文献

Introduction: What Do We Know about School Size and Class Size? High School Size, Organization, and Content: What Matters for Student Success? International Evidence on Expenditure and Class Size: A Review Policy from the Hip: Class Size Reduction in California Class Size and School Size: Taking the Trade-Offs Seriously