For decades we’ve been studying, experimenting with, and wrangling over different approaches to improving public education, and there’s still little consensus on what works, and what to do. The one thing people seem to agree on, however, is that schools need to be held accountable—we need to know whether what they’re doing is actually working. But what does that mean in practice?
High-stakes tests. Lots of them. And that has become a major problem. Daniel Koretz, one of the nation’s foremost experts on educational testing, argues in The Testing Charade that the whole idea of test-based accountability has failed—it has increasingly become an end in itself, harming students and corrupting the very ideals of teaching. In this powerful polemic, built on unimpeachable evidence and rooted in decades of experience with educational testing, Koretz calls out high-stakes testing as a sham, a false idol that is ripe for manipulation and shows little evidence of leading to educational improvement. Rather than setting up incentives to divert instructional time to pointless test prep, he argues, we need to measure what matters, and measure it in multiple ways—not just via standardized tests.
Right now, we’re lying to ourselves about whether our children are learning. And the longer we accept that lie, the more damage we do. It’s time to end our blind reliance on high-stakes tests. With The Testing Charade, Daniel Koretz insists that we face the facts and change course, and he gives us a blueprint for doing better.
|Publisher:||University of Chicago Press|
|Sold by:||Barnes & Noble|
|File size:||930 KB|
About the Author
Read an Excerpt
Beyond All Reason
Pressure to raise scores on achievement tests dominates American education today. It shapes what is taught and how it is taught. It influences the problems students are given in math class (often questions from earlier tests), the materials they are given to read, the essays and other work they are required to produce, and often the manner in which teachers grade this work. It determines which educators are rewarded, punished, and even fired. In many cases it determines which students are promoted or graduate. This is the result of decades of "education reforms" that progressively expanded the amount of externally imposed testing and ratcheted up the pressure to raise scores. Although some people mistakenly identify these test-based reforms with the federal No Child Left Behind Act (NCLB) enacted in 2001, they began years earlier, and they will continue under the somewhat less draconian Every Student Succeeds Act (ESSA) that replaced NCLB in 2015.
A few examples will illustrate how extreme — often simply absurd — this focus on testing has become.
In 2012 two high schools in the Anaheim School District issued ID cards and day planners to students that were color-coded based on the students' performance on the previous year's standardized tests: platinum for those who scored at the "advanced" level, gold for those who scored "proficient," and white for everyone else. Students with premium cards were allowed to use a shorter lunch line and received discounts on entry to football games and other school activities.
Newspapers are replete with reports of students who are so stressed by testing that they become ill during testing or refuse to come to school. In 2013, for example, eight New York school principals jointly sent a letter to parents that included this: "We know that many children cried during or after testing, and others vomited or lost control of their bowels or bladders. Others simply gave up. One teacher reported that a student kept banging his head on the desk, and wrote, 'This is too hard,' and 'I can't do this,' throughout his test booklet."
In many schools it is not just testing itself that stresses students; they are also stressed by the unrelenting focus on scores and on their degree of preparation for the end-of-year accountability tests. For example, some schools post "data walls" that show each student's performance on practice tests used to prepare kids for the main event at the end of the year. This is intended to be motivating, but it shames some students. One third-grade teacher who caved in to pressure to post a data wall wrote this:
[One student,] I'll call her Janie, immediately noticed the two poster-size charts I'd hung low on the wall. Still wearing her jacket, she let her backpack drop to the floor and raised one finger to touch her name on the math achievement chart. Slowly, she traced the row of dots representing her scores for each state standard on the latest practice test. Red, red, yellow, red, green, red, red. Janie is a child capable of much drama, but that morning she just lowered her gaze to the floor and shuffled to her chair....
Even an adult faced with a row of red dots after her name for all her peers to see would have to dig deep into her hard-won sense of self to put into context what those red dots meant in her life and what she would do about them. An 8-year-old just feels shame.
The press to test students has sometimes been taken to lengths that are both absurd and cruel. Valerie Strauss of the Washington Post wrote a number of reports about students with severe cognitive disabilities — one born with only a brain stem — who were forced to take high-stakes tests. When one of them lay dying in a morphine coma, the school district refused to accept his mother's explanation that he was in hospice care and demanded written confirmation from the hospice agency that the student was indeed dying.
Shauna Paedae is a National Board Certified mathematics teacher with a bachelor's degree in mathematics, a master's degree in statistics, and three decades of experience as a teacher. During the 2011–12 school year she taught advanced mathematics in a high school in Pensacola, Florida: International Baccalaureate Mathematical Studies, Calculus, and Algebra 2. All but two of her students were in the eleventh and twelfth grades. That year 50 percent of her performance evaluation was based on a "value added measure" (VAM), a measure intended to show how much her teaching had contributed to students' performance gains on the Florida Comprehensive Assessment Test (FCAT). However, there were no FCAT mathematics tests administered above grade 8. Instead her district based her VAM on the school-wide performance of students taking the tenth-grade FCAT reading test — a test in a different subject administered, with only two exceptions, to different students in an earlier grade.
Kim Cook is a first-grade teacher in Alachua County, Florida, who was selected as her school's Teacher of the Year in 2012–13. In 2011–12 she had the same problem as Shauna: there are no FCAT tests in first grade. They are first administered in the third grade, and because Kim's school enrolls only students in preschool through second grade, no students in her school took the FCATs. Her school board resolved this problem by basing 40 percent of her evaluation on the test scores of fourth- and fifth-grade students in another school.
Paedae and Cook were among a group of plaintiffs who sued the Florida commissioner of education, members of the state board of education, and their local school boards in 2013 in an attempt to put an end to the absurd practice of evaluating teachers based on the performance of students they don't even teach, often in subjects they don't teach, and sometimes in different schools.
In August 2014 Rebecca Holcombe, the Vermont secretary of education, reported seemingly dire information about the performance of the state's schools. Like all states, Vermont accepts certain federal funds that require the state to follow the test-based accountability requirements of federal law — NLCB at that time, and now ESSA. Holcombe reported that under the terms of NCLB, every school in the state that had administered the state tests was classified as a low-performing school in need of improvement by the US Department of Education and was therefore subject to a series of escalating sanctions.
This bleak news, however, followed by less than a year another report from the US Department of Education indicating that in eighth-grade mathematics Vermont is very high performing, not only in comparison to other states but by international standards as well. For half a century the department has sponsored the National Assessment of Educational Progress (NAEP), a set of tests administered to representative samples of students across the country. The NAEP is widely considered the best test for monitoring overall trends in the performance of American students. The department linked the NAEP to the Trends in International Mathematics and Science Study (TIMSS) assessment, one of the two leading international comparative tests, "to provide each state with a way to examine how their students compare academically with their peers around the world in mathematics and science." The study included all fifty states as well as forty-seven countries. In eighth-grade mathematics Vermont ranked seventh; its average score was exceeded only by of Massachusetts and five East Asian countries that always score near the top in international comparisons of mathematics achievement: Japan, Hong Kong, Taipei, Singapore, and Korea. Vermont outscored Finland, often held up as a high-achieving country the United States should emulate, by a large margin.
Thus Holcombe had to report to parents and the public that in terms of the accountability policies that were mandated by law, every school in one of the highest-performing jurisdictions in the world — even the schools that were at the very top of Vermont's very high distribution of scores — were performing so badly that they deserved sanctions. To her credit, Holcombe (a former student of mine) resolved this absurd contradiction in a reasonable if understated way. She wrote, "The Vermont Agency of Education does not agree with this federal policy, nor do we agree that all of our schools are low performing." Her sensible response, however, was very much an exception.
These examples, while extreme, are not anomalous. For example, Tennessee, like Florida, evaluates some teachers based on the scores obtained by students they don't teach, and in Tennessee as well, a lawsuit challenging this policy failed. New York State required that all teachers be evaluated with scores and gave districts the choice between finding tests for teachers for whom they had none — art teachers, for example — and evaluating those teachers with the scores of other teachers' students. New York City opted to follow the Florida model, with the exception that scores had to be from the same school. Vermont wasn't alone in having high-performing schools classified as failures under the provisions of NCLB; Washington, also a high-performing state, had nearly 90 percent of its schools classified as in need of improvement. There are abundant newspaper reports of teachers who are falsely classified as failing despite ample evidence that they are actually highly effective. Reports of students having somatic symptoms because of anxiety about high-stakes tests, or being forced to take them despite being ill, have appeared often in the media. And for every example that is so extreme as to be newsworthy, there are countless other unreported instances of misused test scores or undesirable responses to testing occurring in schools across the nation every day.
Test-based accountability has become an end in itself in American education, unmoored from clear thinking about what should be measured, how it should be measured, or how testing can fit into a rational plan for evaluating and improving our schools. It is hard to overstate how much this matters — for children, for educators, and for the American public.
The rationale for these policies is deceptively simple. American schools are not performing as well as we would like. They do not fare well in international comparisons, and there are appalling inequities across schools and districts in both opportunities for students and student performance. These problems have been amply documented. The prescription that has been imposed on educators and children in response is seductively simple: measure student performance using standardized tests and use those measurements to create incentives for higher performance. If we reward people for producing what we want, the logic goes, they will produce more of it. Schools will get better, and students will learn more.
However, this reasoning isn't just simple, it's simplistic — and the evidence is overwhelming that this approach has failed. That is not to say it hasn't produced any improvements. It has. But these improvements are few and small. Hard evidence is limited, a consequence of our failure as a nation to evaluate these programs appropriately before imposing them on all children. The best estimate is that test-based accountability may have produced modest gains in elementary-school mathematics but no appreciable gains in either reading or high-school mathematics — even though reading and mathematics have been its primary focus. These meager positive effects must be balanced against the many widespread and serious negative effects. Test-based accountability has led teachers to waste time on all manner of undesirable test preparation — for example, teaching children tricks to answer multiple-choice questions or ways to game the rules used to score the tests. Testing and test preparation have displaced a sizable share of actual instruction, in a school year that is already short by international standards. Test-based accountability has led to a corruption of the ideals of teaching. In an apparently increasing number of cases, it has led to manipulation of the tested population (for example, finding ways to keep low achievers from being tested) and outright cheating, some instances of which have led to criminal charges and even imprisonment. And it has created gratuitous and often enormous stress for educators, parents, and, most important, students.
Ironically, our heavy-handed use of tests for accountability has also undermined precisely the function that testing is best designed to serve: providing trustworthy information about student achievement. It has led to "score inflation": increases in scores much higher than the actual improvements in achievement that they are supposedly measuring. This problem was predicted by measurement experts nearly seventy years ago, and we have more than twenty years of research showing that false gains are common and often very large. It's not uncommon for gains on high-stakes tests to be several times as large as they should be. The result is illusions of progress: student performance appears to be improving far more than it really is. This cheats parents, students, and the public at large, who are being given a steady stream of seriously misleading good news.
Perhaps even worse, these bogus score gains are more severe in some schools than in others. The purpose of test-based accountability system is to reward effective practice and encourage improvements. However, because score inflation varies from school to school and system to system, the wrong schools and programs are sometimes rewarded or punished, and the wrong practices may be touted as successful and emulated. And an increasing amount of evidence suggests that on average, schools that serve disadvantaged students engage in more test preparation and therefore inflate scores more, creating an illusion that the gap in achievement between disadvantaged and advantaged children is shrinking more than it is. This is another irony, as one of the primary justifications for the current test-based accountability programs has been to improve equity.
The evidence of these failures has been accumulating for more than a quarter century. Yet it is routinely ignored — in the design of educational programs, in public reporting of educational "progress," and in decisions about the fates of schools, students, and educators.
Don't make the mistake of thinking that these problems will disappear now that NCLB has finally been replaced. Test-based accountability was well established in this country before NCLB, and it will continue now that ESSA has replaced it. It's true that NCLB was a very poorly crafted set of policies — a train wreck waiting to happen, some of us said when it was enacted — and it did substantial harm. ESSA does remove some of the more draconian elements of NCLB, and that may help lessen some of the problems I describe here. Nevertheless, ESSA continues the basic model of test-based accountability, while returning to states just a fraction of the discretion they had in implementing this model before NCLB was enacted. Individual states started this ball rolling decades ago, so there isn't much reason to expect that they would turn in a fundamentally different direction now, even if ESSA permitted them to. And in any case, it doesn't let them change course anywhere nearly as much as I argue they should.
This book documents the failures of test-based accountability. I will describe some of the most egregious misuses and outright abuses of testing, and I will document some of the most serious negative effects. I'll explain why these effects have occurred. To put these harms into perspective, I will also describe the modest positive effects the testing policies have had.
Supporters of our current system will no doubt want to dismiss this book as yet another anti-testing or anti-accountability screed. It's neither. Standardized tests, if properly used, are a valuable and in some instances irreplaceable tool. They provide us with important information that is not available from other sources. For example, we all know that there is a troubling, large, and persistent gap in performance between white students and some minority students. How do we know that? Standardized tests. We've known for decades that American students don't perform as well in mathematics as students in many other countries. How do we know? Again, standardized tests. And the information in this book, as damning as it is regarding our current accountability system, is not an argument against accountability. My experience as a public school teacher, my years as the parent of children in public schools, and my decades of work as a researcher in education have made clear to me the need for more rigorous and effective accountability in public education.
Moreover, I am not questioning the motives of the many people who pushed for imposing test-based accountability on schools. Many, I know for a fact, had the best of intentions: they wanted to improve the quality of schools, to help all students learn more, and to narrow the gaps between advantaged and disadvantaged students.
Excerpted from "The Testing Charade"
Copyright © 2017 The University of Chicago.
Excerpted by permission of The University of Chicago Press.
All rights reserved. No part of this excerpt may be reproduced or reprinted without permission in writing from the publisher.
Excerpts are provided by Dial-A-Book Inc. solely for the personal use of visitors to this web site.
Table of ContentsContents Acknowledgments 1. Beyond All Reason 2. What Is a Test? 3. The Evolution of Test-Based “Reform” 4. Campbell’s Law 5. Score Inflation 6. Cheating 7. Test Prep 8. Making Up Unrealistic Targets 9. Evaluating Teachers 10. Will the Common Core Fix This? 11. Did Kids Learn More? 12. Nine Principles for Doing Better 13. Doing Better 14. Wrapping Up Notes Index