Uh-oh, it looks like your Internet Explorer is out of date.
For a better shopping experience, please upgrade now.
In this alternately amusing and appalling exposé of the standardized test industry, fifteen-year veteran Todd Farley describes statisticians who make decisions about students without even looking at their test answers; state education officials willing to change the way tests are scored whenever they don’t like the results; and massive, multi-national, for-profit testing companies who regularly opt for expediency and profit over the altruistic educational goals of teaching and learning. Although there are absurd momentsas when Farley and coworkers had to grade students based on how they described the taste of their favorite food the enormous importance of standardized tests in the post “No Child Left Behind” era make this no laughing matter.
“This book is dynamite! The nice personal voice makes it utterly accessible and enticing, wholly apart from the terribly important ammunition it provides to those of us in the 'testing wars’ at national and local levels.”—Jonathan Kozol, author of Savage Inequities
|Product dimensions:||5.40(w) x 8.40(h) x 0.70(d)|
About the Author
For fifteen years, Todd Farley worked for renowned companies on some of the most important standardized tests in America. During that time, he wrote and scored tests in math, reading, science, social studies, history, writing, health, and the arts. Since leaving the industry, he has contributed articles to Education Week, Rethinking Schools, and other publications. He lives in New York City.
Read an Excerpt
Making the GradesMy Misadventures in the Standardized Testing Industry
By Todd Farley
Berrett-Koehler Publishers, Inc.Copyright © 2009 Todd Farley
All right reserved.
Chapter OneScoring Monkey
I BEGAN TO DOUBT the efficacy of standardized testing in 1994, about four hours into my first day scoring student responses to a state test. At the time I was a 27-year-old slacker/part-time grad student at the University of Iowa, and my friend Greg had referred me to NCS (National Computer Systems, a test-scoring company in Iowa City) as a good place to get decent-paying and easy work. Soon thereafter, after a perfunctory group interview that entailed no more than flashing my college diploma at an HR rep and penning a short essay about "teamwork" (an essay I'm pretty sure no one read), I had myself a career in "education."
On my first day, we new employees, as well as dozens of more experienced scorers, met at the company's rented property on the north side of Iowa City, a warren of tiny rooms filled with computers in the dank downstairs of an abandoned shopping mall. Within 10 minutes of sitting down, the gent sitting next to me—named Hank, a floppy leather hat perched on his head, a pair of leather saddlebags slung across his shoulder—confessed he had worked at NCS for years and regaled me with stories of his life. In no time he told me how he had overcome his nose-picking habit (a dab of Vaseline in the nostrils) and offered to show me the erotic novella he was writing, beginning to pull it from a saddlebag. I politely declined and wondered what I'd gotten myself into.
Other than Hank, around me was a bunch that looked no better. I had dressed how I thought appropriate for the first day of a new job (a pressed pair of khakis, loafers, a buttoned-down blue shirt), but all around my colleagues were slumped like bored college students and mid-1990s slackers in sweat pants and ripped jeans. A whole lot of heads seemed like they had not lately been shampooed; lots of faces looked groggy and uninterested.
The building itself also failed to inspire. We were belowground, 12 people sitting in our small room around two islands of six computer monitors each, the only windows about eight feet in the air and offering a view of the tires on the cars out back. Occasionally we could see the shoes of people walking by. The room was lowly lighted by phosphorescent bulbs and smelled antiseptic, like cleaning products and the musty industrial rugs that covered the floors. I couldn't imagine I could continue to work there, a man of my grandiose literary ambitions. My only hope was that the job itself would prove interesting.
After perhaps an hour's worth of idling about, waiting for management to seat everyone and file paperwork and start the computers, we began our task: the scoring of student responses to open-ended questions on standardized tests. The six people at my island of computers would score a fourth-grade reading test from a state on the Gulf of Mexico, the tests of those 9- and 10-year-olds from the Deep South being scored by this group of mostly white, midwestern adults. Before we began, however, we were trained on the process by our supervisor/"table leader," Anita.
Anita first showed us the item the students had been given, a task requiring them to read an article about bicycle safety before directing them to make a poster for other students to highlight some of those bike safety rules. Some of us mentioned it seemed like an interesting task, having the students use their creativity to show their understanding of bicycle safety by drawing a poster instead of asking them multiple-choice questions. I nodded to myself, smiling, approving that this first standardized test question I'd seen in years was open to so many possibilities. The question was definitely not rigid or stringent, and it allowed the students to respond in myriad ways.
Next, Anita explained the rubric we would use to score the student work (a rubric, or "scoring guide," is the instructions given to the professional scorers on how they should mete out credit to the student responses). She pointed out how easy the task would be to score, as it was a dichotomous item where students were given either full credit or no credit. If a student's poster showed a good example of a bicycle safety rule (like riding with a helmet or stopping at a stop sign), full credit was earned. If a student's poster showed a poor example of bicycle safety rules (like riding with no hands or riding two abreast in the road), no credit was earned. Finally, Anita showed us training papers, actual student work that had earned either full or no credit. She showed us 20 or 30 "Anchor Papers," examples of posters that had earned the score of 1 and others given the score of 0. Eventually she gave us unscored papers to practice with, reading the responses on our own and individually deciding what score to give. After we discussed the Practice Papers as a group and Anita was convinced we all understood the scoring rules, it was time to begin.
At that point I was operating under the impression the item was relevant and interesting. I also thought the rubric was absolutely clear and would be a breeze to apply. And from my experience scoring the Practice Papers, I expected to have absolutely no difficulty scoring the actual student responses. At that point, it was all so clean and clear and indisputable I would certainly have been counted among the converts to the idea that standardized testing could be considered "scientifically based research" (to which the No Child Left Behind Act alludes more than 100 times). At that point, I had no doubt I was involved in important work that could produce absolute results.
And then we started to score.
The thing about rubrics, I discovered (and would subsequently continue to discover over the years), is that while they are written by the best intentioned of assessment experts and classroom teachers, they can never—never!—come remotely close to addressing the million different perspectives students bring in addressing a task or the zillion different ways they answer questions. If nothing else, standardized testing has taught me the schoolchildren of America can be one creative bunch.
I bring this up because the very first student response I would ever score in my initial foray into the world of standardized testing was a bicycle safety poster that showed a young cyclist, a helmet tightly attached to his head, flying his bike in a fantastic parabola up and over a canal filled with flaming oil, his two arms waving wildly in the air, a gleeful grin plastered on his mug. A caption below the drawing screamed, "Remember to Wear Your Helmet!"
I stared at my computer screen (the students filled out their tests and those tests were then scanned into NCS's system for distribution to the scorers), looked at my rubric, and thought, "What the #@^&$!?!" In preparing to score the item, we'd all agreed how to apply the rubric and had addressed what seemed like simple issues: credit for good bicycle safety rules, no credit for bad ones. It had seemed so clear.
Looking at my screen, I muttered to myself, held both hands in the air in the universal sign of "Huh?" and flipped through the Anchor and Practice Papers while awaiting a revelation. Certainly the student had shown an understanding of at least one bicycle safety rule (the need for a helmet), which meant I was to give him the score of 1. On the other hand, the student had also indicated such a fundamental misunderstanding of a number of other cycling safety rules—keeping a firm grip on the handlebars, not biking through walls of fire—I couldn't see how I could ever award him full credit. I was actually more worried about the student's well-being than I was concerned with his score.
I held my palms up. I mumbled. I flipped through the training papers. Eventually Anita stood behind me, looking at my screen.
"What are you going to do here, Todd?" she asked.
"Good question," I said.
"Does the student show an understanding of a safety rule?" she asked.
"One safety rule," I said.
"And that means you're going to give it what score?" she asked.
"A 1?" I said, looking over my shoulder at her.
She nodded. "Yup."
"Really?" I asked her. "We don't care that as a result of following these 'safety rules' the student is almost certainly going to die?"
She laughed. "I think he was having fun, and he certainly knows how important helmets are."
"Yes, he does," I agreed. "Now let's hope he's wearing a nonflammable one when he crashes his no-hands bike into the burning oil."
She smiled, but less enthusiastically. "We don't make the rules, Todd, we just apply them. The state Department of Education says understanding one safety rule earns the student full credit, so we give them full credit."
I shook my head. "We don't care about the context? We count one good safety rule among three bad ones the same as we do one good rule?"
Anita smiled, perhaps ruefully. "One good safety rule earns full credit," she said. She turned to head back to her own computer, and I watched as she walked away. Hank looked at me, shrugged his shoulders, and smiled. One of the other scorers leaned in toward me and grinned.
"Basically," he said, "we are a bunch of scoring monkeys. No thought required."
"Just click," Hank added, making a motion with his mouse finger. "Just click."
They each nodded to me, shaking their heads slowly up and down, bemused looks on their faces. I realized the two of them had definitely drunk the NCS Kool-Aid.
So as Anita insisted, and for reasons that were clear to me but also hard to believe, I clicked on the 1 button, and the response was scored. In the parlance of NCS, I had officially become a "professional scorer," which seemed a slightly exaggerated title for the work I was doing. The poster of the helmeted daredevil slid off my screen and was replaced by another.
Many of the student responses were easy to score. Most students simply showed one safety rule (a biker stopped at a stop sign, another using hand signals to indicate their direction), and I would give those responses full credit. Others ignored safety rules entirely (showing a biker doing a wheelie in the middle of the street, for instance, or drawing unhelmeted cyclists jumping over fiery moats), and I gave those responses no credit. Other students earned no points for using the blank poster only as an opportunity to sketch, and there were enough doodles of family pets and "best friends forever" to reconsider the brilliant idea of having fourth graders draw pictures as a part of their tests.
Many of the student responses, however, were befuddling, and we scorers might not know what safety rule was being addressed. Sometimes the handwriting was hard to decipher, and for lengths of time the group would unsuccessfully ponder over a word like grit before giving up (later someone would yell "right," their mind having subconsciously solved that puzzle even though a score had long ago been given to the response). Other times the drawings were impossible to interpret, and whether we were looking at a biker or surfer or equestrian was not completely clear. On innumerable occasions the scanning of the tests made it incredibly hard to even see the student responses, leaving us leaning forward to squint at vague and fuzzy lines. Some of the drawings did include a caption to emphasize the safety rule ("Use hand signals!" or "Ride single file!"), but others let the drawings stand alone, leaving us confused. We would usually mull over the response on our screens by ourselves before eventually giving up.
"Is this poster indicating bikers should use hand signals?" someone would ask the group. We would huddle around his or her screen.
"I think so," someone would answer.
"No," I might say, "I think they're waving to a friend."
"No," another scorer would disagree, "I think that biker is giving someone the finger!" And we would laugh, but who really knew what that fourth-grade drawing was getting at?
"Really," the scorer sitting there would say, getting frustrated, "is this acceptable or not?"
The rest of us would begin to disperse.
"Good luck with that...."
And we would scatter back to our own desks, back to our own screens of problematic, fourth-grade, bike safety hieroglyphics.
Anita would always try to solve the problem. "Is there a clear bike safety rule?" she would ask. "If there is, credit it. If not, don't."
"What if we're not sure?" someone asked. "This might be a good rule."
"A clear bike safety rule gets credit," she said. "If not, it doesn't." Anita was a very efficient woman, very direct, and frankly, I liked her less with each passing minute. She acted like it was all so obvious, and meanwhile I was attempting to interpret the Crayola musings of a nine-year-old.
Anita's major contribution to our test scoring was in the form of backreading. As we scored the student responses, she would randomly review on her computer screen a small number of the scores each of the six of us were doling out, checking to see we were applying the rules correctly and in a consistent, standardized form. At times this was helpful, as Anita would call us up to her desk to show us a response we may have misscored.
"Remember, Todd," she might advise, pointing to a student response on her computer screen, "we are crediting 'riding in single file' as an acceptable safety rule. You gave this response a 0, but it should be a 1."
"Of course," I'd apologize, "Sorry. I'll credit it next time." Her advice was often helpful in remembering the rules and improving my scoring, so in general I was not averse to heeding her counsel. No one necessarily likes to be told they are wrong, but I understood what Anita was telling me was part of my learning curve at the new job. I stoically soldiered on.
Other times I thought Anita was nuts. Near the end of my first day, she called me up to her desk.
"You gave this the score of 0," she said. "How come?"
"I gave it a 0 because it doesn't show any bicycle safety rules," I said.
"That's not a bike at a stop sign?" she asked.
"No, that's a truck at a stop sign," I told her.
"And what's behind the truck?"
"Well," I said, feeling the blood rushing to my cheeks, "behind the truck is a flat-bed trailer, and securely fastened to that trailer by heavy chains is a bike without a rider." The other scorers began to giggle, laughing at my description and realizing Anita and I were on the verge of a small spat. They began to mill around the screen to look at the disputed student response.
"You cannot be telling me that poster shows any understanding of a bike safety rule."
"Yes, I can," Anita said.
"That might be a car-driving rule," I argued, "but it's not a bike safety rule."
"No way, Anita," someone chimed in, "there's not even a rider on the bike."
"Look," she said, her voice starting to rise, "the rule we've been adhering to is that a bike at a stop sign earns full credit."
"It's not a bike," I said. "It's a truck at a stop sign!"
"There's no one on the bike!" someone mumbled.
"Don't worry about it," Anita said. "Remember, all we can do is apply the scoring rules the state gave us. They said a bike at a stop sign is acceptable, so we credit it."
We headed back to our desks, considerable bitching going on along the way. I shook my head but had to laugh.
"My God," I continued, "we're going to wipe out the entire population of elementary students in that state. They're going to be riding into fires thinking they'll be saved by their helmets, going to think they only have to stop their bikes at stop signs if they're strapped to the back of a truck."
"Enough," Anita said. "It's been a good first day, so let's wrap it up. Just score the response on your screen, and then shut your computer down."
I shook my head, smirking. What did it all mean? Could the 1 or 0 that I was punching into the computer really tell anyone anything about these students? It all seemed so random. I decided to score that one final response on my big, first day before I could head home to take out my frustrations on the soccer field. One more response, I told myself, just do one more.
I looked at my screen, and I was amazed.
Excerpted from Making the Grades by Todd Farley Copyright © 2009 by Todd Farley. Excerpted by permission of Berrett-Koehler Publishers, Inc.. All rights reserved. No part of this excerpt may be reproduced or reprinted without permission in writing from the publisher.
Excerpts are provided by Dial-A-Book Inc. solely for the personal use of visitors to this web site.
Table of Contents
PART 1 WAGE SLAVE
Chapter 1. SCORING MONKEY
Chapter 2. NUMBERS
Chapter 3. THE WHEAT FROM THE CHAFF
Chapter 4. OFF TASK
PART 2 MANAGEMENT
Chapter 5. TABLE LEADER
Chapter 6. THE ORACLE OF PRINCETON
Chapter 7. THE KING OF SCORING
Chapter 8. A REAL JOB
PART 3 RETIREMENT
Chapter 9. MY OWN PRIVATE HALLIBURTON
Chapter 10. WORKING IN THEORY
Chapter 11. WARM BODIES
About the Author
Most Helpful Customer Reviews
For anyone interested in the test scoring industry, Todd Farley's book, Making the Grades, is a must read. This book is particularly relevant now that testing of students is a major issue in the schools around the country.
A stunning and highly readable book. We can't open the paper today without reading a story about education or education reform that mentions test scores. Test scores going up. Going down. Cheating. More tests being mandated. Teachers being rewarded or punished or even fired based on these standardized test scores. But after reading this book, you will never read these stories the same way again. Mr. Farley explains in great detail based on his own 15 years in the standardized testing industry how these tests are created and scored. Speed and profit come before accuracy. We have outsourced the education of our children to the lowest bidder and the test results are universally junk. Garbage in garbage out. Mr. Farley has been interviewed by CBS news, and has written Guest editorials for the New York Times and Washington Post. The No Child Left Behind/Race to the Top testing mania has gripped Washington under Republicans and Democrats alike. After reading this book, you'll be demanding congressional hearings.