并非所有的成就测试都是平等的

CSN: Pedagogy (Topic) Pub Date : 2009-01-29 DOI:10.2139/ssrn.1334907

R. Hanson, Dick Schutz

{"title":"并非所有的成就测试都是平等的","authors":"R. Hanson, Dick Schutz","doi":"10.2139/ssrn.1334907","DOIUrl":null,"url":null,"abstract":"Why We Were Interested:Standardized achievement tests have come to be recognized as the \"one and only acceptable means of measuring how well kids, teachers, and schools are doing. This is despite the fact that the tests don't in any way match the instruction that kids in any given class receive. Moreover, they don't even try to determine what kids have learned, but only how they stack up with kids at the same grade level. People inside and outside of testing recognize these fatal flaws, but they don't consider them fatal. The excuse is, \"They're the best we have. There's no better way.\" We thought there might indeed be a better way.What We Did:We had access to around 200,000 kids in about 450 schools in rural districts in 19 states in the US that were participating in a structured program in reading and math in grades 1-6. The instruction cut across grade levels to concentrate on matters the kids had not learned, irrespective of their grade. At the end of the year, we gave a large sample of the kids tests to get data on fourareas of instruction: reading decoding and math computation, that we termed \"definite instruction\" and reading comprehension and math concepts that we termed \"indefinite instruction. We used three different kinds of tests in each instructional area: off-the-shelf standardized achievement tests, tests that matched the general curriculum emphases at a given grade, and tests that matched the actual instruction the kids had received.We tested each kid at a level of the test that most closely matched the level of instruction received during the year (referred to as \"at instruction\") and also the next higher level for each kind of test (referred to as \"above instruction\").So we were able to look at what happened in the two subjects - reading and math - in the two areas of instruction - definite and indefinite - on the three kinds of tests - that varied in the degree they departed from the instruction received, and that also varied in terms of whether they focused on matters - the level of the instruction received or were above the level of the instruction received. Quite a bit to look at.What We Found Out:First we did the standard analyzes of how reliable the tests were and how they interrelate, It turned out that each test was acceptably reliable. And the tests in each subject were highly correlated. Had we stopped here we would have concluded, \"It really doesn't matter which kind of test you use. And since standardized tests are traditional, they win.\"-which is about what the testing experts say and what the public accepts. We went on to look at the information yielded by the standardized achievement test (SAT) Grade Equivalents. The SATs that most closely matched the instruction the kids received produced low grade equivalents. But these grade equivalents reflected the level of the instruction at which the kids were working. But SATs are not administered at the grade level at which the kid is (or should be) working, but rather at the grade level in which they are enrolled. Looking at the \"abovelevel\" results, which reflect this practice, we find higher Grade Equivalents (although still somewhat \"behind) despite the fact the percentage of items answered correctly was lower than for the \"at level\" test.Tests referenced to the general curriculum both showed a high level of performance \"at level\" on matters they had been taught and a lower level of performance on matters \"above level\" that they hadn't yet been taught.Finally, we looked at the differences between the results of \"definite\" and \"indefinite instruction. The results on the \"definite\" instruction were consistently higher than on the \"indefinite\" instruction. Secondly, with the exception of Reading Decoding at Grade 3 (which is where the remaining complicated and infrequently encountered words are the focus) the performance increases from grade to grade. With \"indefinite instruction\" the pattern is different. Here the performance rises to a peak and then declines! This is obviously a function of the nature of the indefinite instruction rather than the nature of the kids. Kids don't become weaker conceptually or comprehend less well with further instruction. It just appears that way when the definition of what is meant by the instructional rubric changes from grade to grade. \"It's not fair,\" as the kids might accurately say.Bottom Line:If you want to find out what kids have learned, test them on the instruction they've received. The further you depart from this common sense conclusion, the more misleading will be the results. It happens, however, that current testing practice departs as far from the conclusion as possible. Since people learn what they are taught (that's the point of teaching, right?) if that's what you test, the results will be a good deal more positive than what we are accustomed to seeing. Some kids learn more than others, but it does little good to rank them with each other. The trick is to determine what a kid has learned, in order to use these assets as a basis for further instruction.The more clearly you define the structure of an instructional matter, the more effective the instruction will be. The results of instruction that is \"all over the place\" will reflect the impoverished thought given to the structure, ever though today the results are commonly attributed to the poverty of kids.","PeriodicalId":371085,"journal":{"name":"CSN: Pedagogy (Topic)","volume":"595 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2009-01-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"All Achievement Tests are Not Created Equal\",\"authors\":\"R. Hanson, Dick Schutz\",\"doi\":\"10.2139/ssrn.1334907\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Why We Were Interested:Standardized achievement tests have come to be recognized as the \\\"one and only acceptable means of measuring how well kids, teachers, and schools are doing. This is despite the fact that the tests don't in any way match the instruction that kids in any given class receive. Moreover, they don't even try to determine what kids have learned, but only how they stack up with kids at the same grade level. People inside and outside of testing recognize these fatal flaws, but they don't consider them fatal. The excuse is, \\\"They're the best we have. There's no better way.\\\" We thought there might indeed be a better way.What We Did:We had access to around 200,000 kids in about 450 schools in rural districts in 19 states in the US that were participating in a structured program in reading and math in grades 1-6. The instruction cut across grade levels to concentrate on matters the kids had not learned, irrespective of their grade. At the end of the year, we gave a large sample of the kids tests to get data on fourareas of instruction: reading decoding and math computation, that we termed \\\"definite instruction\\\" and reading comprehension and math concepts that we termed \\\"indefinite instruction. We used three different kinds of tests in each instructional area: off-the-shelf standardized achievement tests, tests that matched the general curriculum emphases at a given grade, and tests that matched the actual instruction the kids had received.We tested each kid at a level of the test that most closely matched the level of instruction received during the year (referred to as \\\"at instruction\\\") and also the next higher level for each kind of test (referred to as \\\"above instruction\\\").So we were able to look at what happened in the two subjects - reading and math - in the two areas of instruction - definite and indefinite - on the three kinds of tests - that varied in the degree they departed from the instruction received, and that also varied in terms of whether they focused on matters - the level of the instruction received or were above the level of the instruction received. Quite a bit to look at.What We Found Out:First we did the standard analyzes of how reliable the tests were and how they interrelate, It turned out that each test was acceptably reliable. And the tests in each subject were highly correlated. Had we stopped here we would have concluded, \\\"It really doesn't matter which kind of test you use. And since standardized tests are traditional, they win.\\\"-which is about what the testing experts say and what the public accepts. We went on to look at the information yielded by the standardized achievement test (SAT) Grade Equivalents. The SATs that most closely matched the instruction the kids received produced low grade equivalents. But these grade equivalents reflected the level of the instruction at which the kids were working. But SATs are not administered at the grade level at which the kid is (or should be) working, but rather at the grade level in which they are enrolled. Looking at the \\\"abovelevel\\\" results, which reflect this practice, we find higher Grade Equivalents (although still somewhat \\\"behind) despite the fact the percentage of items answered correctly was lower than for the \\\"at level\\\" test.Tests referenced to the general curriculum both showed a high level of performance \\\"at level\\\" on matters they had been taught and a lower level of performance on matters \\\"above level\\\" that they hadn't yet been taught.Finally, we looked at the differences between the results of \\\"definite\\\" and \\\"indefinite instruction. The results on the \\\"definite\\\" instruction were consistently higher than on the \\\"indefinite\\\" instruction. Secondly, with the exception of Reading Decoding at Grade 3 (which is where the remaining complicated and infrequently encountered words are the focus) the performance increases from grade to grade. With \\\"indefinite instruction\\\" the pattern is different. Here the performance rises to a peak and then declines! This is obviously a function of the nature of the indefinite instruction rather than the nature of the kids. Kids don't become weaker conceptually or comprehend less well with further instruction. It just appears that way when the definition of what is meant by the instructional rubric changes from grade to grade. \\\"It's not fair,\\\" as the kids might accurately say.Bottom Line:If you want to find out what kids have learned, test them on the instruction they've received. The further you depart from this common sense conclusion, the more misleading will be the results. It happens, however, that current testing practice departs as far from the conclusion as possible. Since people learn what they are taught (that's the point of teaching, right?) if that's what you test, the results will be a good deal more positive than what we are accustomed to seeing. Some kids learn more than others, but it does little good to rank them with each other. The trick is to determine what a kid has learned, in order to use these assets as a basis for further instruction.The more clearly you define the structure of an instructional matter, the more effective the instruction will be. The results of instruction that is \\\"all over the place\\\" will reflect the impoverished thought given to the structure, ever though today the results are commonly attributed to the poverty of kids.\",\"PeriodicalId\":371085,\"journal\":{\"name\":\"CSN: Pedagogy (Topic)\",\"volume\":\"595 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2009-01-29\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"CSN: Pedagogy (Topic)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.2139/ssrn.1334907\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"CSN: Pedagogy (Topic)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2139/ssrn.1334907","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

关注理由:标准化成绩测试已经被认为是衡量孩子、老师和学校表现好坏的“唯一可接受的方法”。尽管这些测试在任何方面都与任何班级的孩子所接受的教育不相符。此外，他们甚至没有试图确定孩子们学到了什么，而只是确定他们与同一年级的孩子相比如何。测试内外的人都认识到这些致命的缺陷，但他们并不认为它们是致命的。借口是，“他们是我们最好的。”没有更好的办法了。”我们认为确实有更好的办法。我们做了什么:我们接触了美国19个州农村地区约450所学校的约20万名儿童，他们正在参加一个结构化的1-6年级阅读和数学课程。教学跨越年级，把重点放在孩子们没有学到的东西上，而不考虑他们的年级。在年底，我们对孩子们进行了大量的测试，以获得四个教学领域的数据:阅读解码和数学计算，我们称之为“确定教学”，阅读理解和数学概念，我们称之为“不确定教学”。我们在每个教学领域使用了三种不同的测试:现成的标准化成就测试，与特定年级的一般课程重点相匹配的测试，以及与孩子们接受的实际教学相匹配的测试。我们对每个孩子进行测试，测试的水平与他们当年接受的教学水平最接近(称为“在教学”)，每种测试的水平也更高(称为“在教学之上”)。因此，我们能够观察阅读和数学这两门科目的情况——在明确和不明确的两种教学领域——在三种测试中——他们偏离所接受的教学的程度不同，他们关注的是所接受的教学水平还是高于所接受的教学水平。有很多东西值得一看。我们发现:首先，我们对测试的可靠性以及它们之间的相互关系进行了标准分析，结果发现每个测试的可靠性都是可以接受的。每个科目的测试都是高度相关的。如果我们就此打住，我们就会得出结论:“使用哪种测试真的无关紧要。由于标准化考试是传统的，他们赢了。——这是关于检测专家说什么和公众接受什么。我们接着来看一下标准化成就测试(SAT)等效等级所提供的信息。与孩子们所接受的指导最接近的sat分数产生了较低的分数。但这些等级分反映了孩子们正在学习的教学水平。但是sat考试并不是按照孩子正在(或应该)学习的年级进行的，而是按照他们入学的年级进行的。看看“高于水平”的结果，它反映了这种做法，我们发现更高的等级等同(尽管仍然有点“落后”)，尽管正确回答的项目百分比低于“在水平”测试。参考一般课程的测试都表明，学生在已教过的“水平”科目上的表现较高，而在尚未教过的“高于水平”科目上的表现较低。最后，我们看了“确定”和“不确定”教学结果之间的差异。“明确”指令的结果始终高于“不确定”指令的结果。其次，除了三年级的阅读解码(剩下的复杂和不经常遇到的单词是重点)，成绩是逐年级提高的。“不定指令”的模式是不同的。在这里，表现上升到顶峰，然后下降!很明显，这是不确定教学的本质而不是孩子的本质造成的。孩子们不会因为进一步的指导而在概念上变得更弱或理解得更差。当不同年级对教学大纲含义的定义发生变化时，情况才会如此。“这不公平，”孩子们可能会准确地说。底线:如果你想知道孩子们学到了什么，那就根据他们接受的指导对他们进行测试。你越偏离这个常识性的结论，结果就越容易误导人。然而，当前的测试实践离结论越远越好。既然人们学的是他们教的东西(这就是教学的意义，对吧?)，如果你测试的是这一点，那么结果将比我们习惯看到的要积极得多。有些孩子比其他孩子学得多，但对他们进行排名并没有什么好处。诀窍在于确定孩子学到了什么，以便将这些资产作为进一步指导的基础。你对教学内容的结构定义得越清楚，教学就越有效。“到处都是”的教学结果将反映出给予这种结构的贫乏思想，尽管今天这种结果通常被归咎于孩子们的贫穷。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

All Achievement Tests are Not Created Equal

Why We Were Interested:Standardized achievement tests have come to be recognized as the "one and only acceptable means of measuring how well kids, teachers, and schools are doing. This is despite the fact that the tests don't in any way match the instruction that kids in any given class receive. Moreover, they don't even try to determine what kids have learned, but only how they stack up with kids at the same grade level. People inside and outside of testing recognize these fatal flaws, but they don't consider them fatal. The excuse is, "They're the best we have. There's no better way." We thought there might indeed be a better way.What We Did:We had access to around 200,000 kids in about 450 schools in rural districts in 19 states in the US that were participating in a structured program in reading and math in grades 1-6. The instruction cut across grade levels to concentrate on matters the kids had not learned, irrespective of their grade. At the end of the year, we gave a large sample of the kids tests to get data on fourareas of instruction: reading decoding and math computation, that we termed "definite instruction" and reading comprehension and math concepts that we termed "indefinite instruction. We used three different kinds of tests in each instructional area: off-the-shelf standardized achievement tests, tests that matched the general curriculum emphases at a given grade, and tests that matched the actual instruction the kids had received.We tested each kid at a level of the test that most closely matched the level of instruction received during the year (referred to as "at instruction") and also the next higher level for each kind of test (referred to as "above instruction").So we were able to look at what happened in the two subjects - reading and math - in the two areas of instruction - definite and indefinite - on the three kinds of tests - that varied in the degree they departed from the instruction received, and that also varied in terms of whether they focused on matters - the level of the instruction received or were above the level of the instruction received. Quite a bit to look at.What We Found Out:First we did the standard analyzes of how reliable the tests were and how they interrelate, It turned out that each test was acceptably reliable. And the tests in each subject were highly correlated. Had we stopped here we would have concluded, "It really doesn't matter which kind of test you use. And since standardized tests are traditional, they win."-which is about what the testing experts say and what the public accepts. We went on to look at the information yielded by the standardized achievement test (SAT) Grade Equivalents. The SATs that most closely matched the instruction the kids received produced low grade equivalents. But these grade equivalents reflected the level of the instruction at which the kids were working. But SATs are not administered at the grade level at which the kid is (or should be) working, but rather at the grade level in which they are enrolled. Looking at the "abovelevel" results, which reflect this practice, we find higher Grade Equivalents (although still somewhat "behind) despite the fact the percentage of items answered correctly was lower than for the "at level" test.Tests referenced to the general curriculum both showed a high level of performance "at level" on matters they had been taught and a lower level of performance on matters "above level" that they hadn't yet been taught.Finally, we looked at the differences between the results of "definite" and "indefinite instruction. The results on the "definite" instruction were consistently higher than on the "indefinite" instruction. Secondly, with the exception of Reading Decoding at Grade 3 (which is where the remaining complicated and infrequently encountered words are the focus) the performance increases from grade to grade. With "indefinite instruction" the pattern is different. Here the performance rises to a peak and then declines! This is obviously a function of the nature of the indefinite instruction rather than the nature of the kids. Kids don't become weaker conceptually or comprehend less well with further instruction. It just appears that way when the definition of what is meant by the instructional rubric changes from grade to grade. "It's not fair," as the kids might accurately say.Bottom Line:If you want to find out what kids have learned, test them on the instruction they've received. The further you depart from this common sense conclusion, the more misleading will be the results. It happens, however, that current testing practice departs as far from the conclusion as possible. Since people learn what they are taught (that's the point of teaching, right?) if that's what you test, the results will be a good deal more positive than what we are accustomed to seeing. Some kids learn more than others, but it does little good to rank them with each other. The trick is to determine what a kid has learned, in order to use these assets as a basis for further instruction.The more clearly you define the structure of an instructional matter, the more effective the instruction will be. The results of instruction that is "all over the place" will reflect the impoverished thought given to the structure, ever though today the results are commonly attributed to the poverty of kids.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

CSN: Pedagogy (Topic)

自引率

0.00%

发文量