We live in a world of Big Dataand it's getting bigger every day. Virtually every choice we make hinges on how someone generates data . . . and how someone else interprets itwhether we realize it or not.
Where do you send your child for the best education?
Big Data. Which airline should you choose to ensure a timely arrival? Big Data. Who will you vote for in the next election? Big Data.
The problem is, the more data we have, the more difficult it is to interpret it. From world leaders to average citizens, everyone is prone to making critical decisions based on poor data interpretations.
In Numbersense, expert statistician Kaiser Fung explains when you should accept the conclusions of the Big Data "experts"and when you should say, "Wait . . . what?" He delves deeply into a wide range of topics, offering the answers to important questions, such as:
- How does the college ranking system really work?
- Can an obesity measure solve America's biggest healthcare crisis?
- Should you trust current unemployment data issued by the government?
- How do you improve your fantasy sports team?
- Should you worry about businesses that track your data?
Don't take for granted statements made in the media, by our leaders, or even by your best friend. We're on information overload today, and there's a lot of bad information out there.
Numbersense gives you the insight into how Big
Data interpretation worksand how it too often doesn't work. You won't come away with the skills of a professional statistician. But you will have a keen understanding of the data traps even the best statisticians can fall into, and you'll trust the mental alarm that goes off in your head when something just doesn't seem to add up.
Praise for Numbersense
"Numbersense correctly puts the emphasis not on the size of big data, but on the analysis of it. Lots of fun stories, plenty of lessons learnedin short, a great way to acquire your own sense of numbers!"
Thomas H. Davenport, coauthor of Competing on Analytics and President’s Distinguished Professor of IT and Management, Babson College
"Kaiser’s accessible business book will blow your mind like no other. You’ll be smarter, and you won’t even realize it. Buy. It. Now."
Avinash Kaushik, Digital Marketing Evangelist, Google, and author, Web Analytics 2.0
"Each story in Numbersense goes deep into what you have to think about before you trust the numbers. Kaiser Fung ably demonstrates that it takes skill and resourcefulness to make the numbers confess their meaning."
John Sall, Executive Vice President, SAS Institute
"Kaiser Fung breaks the bad newsa ton more data is no panaceabut then has got your back, revealing the pitfalls of analysis with stimulating stories from the front lines of business, politics, health care, government, and education. The remedy isn’t an advanced degree, nor is it common sense. You need Numbersense."
Eric Siegel, founder, Predictive Analytics World, and author, Predictive Analytics
"I laughed my way through this superb-useful-fun book and learned and relearned a lot. Highly recommended!"
Tom Peters, author of In Search of Excellence
Related collections and offers
|Publisher:||McGraw-Hill Professional Publishing|
|Product dimensions:||5.80(w) x 8.30(h) x 0.90(d)|
About the Author
Read an Excerpt
HOW TO USE BIG DATA TO YOUR ADVANTAGE
By KAISER FUNG
McGraw-Hill EducationCopyright © 2013 Kaiser Fung
All rights reserved.
Why Do Law School Deans Send Each Other Junk Mail?
The University of Michigan launched a special admissions program to its law school in September 2008. This Wolverine Scholars Program targeted the top sliver of Michigan undergraduates, those with a 3.80 cumulative grade point average (GPA) or higher at the Ann Arbor campus, allowing them to apply to the ninth-ranked law school as soon as they finish junior year, before the competition opens up to applicants from other universities. Admissions Dean Sarah Zearfoss described the initiative as a "love letter" from the Michigan Law School to its undergraduate division. She hoped this gesture would convince Michigan's brightest young brains to stay in Ann Arbor, rather than draining to other elite law schools.
One aspect of the Wolverine Scholars Program was curious, and immediately stirred much index-finger-wagging in the boisterous law-school blogosphere: The applicants do not have to submit scores from the Law School Admission Test (LSAT), a standard requirement of every applicant to Michigan and most other accredited law schools in the nation. Even more curiously, taking the LSAT is a cause for disqualification. Why would Michigan waive the LSAT for this and only this slice of applicants? The official announcement anticipated this question:
The Law School's in-depth familiarity with Michigan undergrad curricula and faculty, coupled with significant historic data for assessing the potential performance of Michigan undergrads at the Law School, will allow us to perform an intensive review of the undergraduate curriculum of applicants, even beyond the typical close scrutiny we devote ... For this select group of qualified applicants, therefore, we will omit our usual requirement that applicants submit an LSAT score.
In an interview with the Wall Street Journal, Zearfoss explained the statistical research: "We looked at a lot of historical data, and [3.80 GPA] is the number we found where, regardless of what LSAT the person had, they do well in the class." The admissions staff believed that some Wolverines with exceptional GPAs don't apply to Michigan Law School, deterred by the stellar LSAT scores of prior matriculating classes.
Many bloggers, themselves professors at rival law schools, were not eating the dog food. They smelled a brazen attempt to promote the national ranking—universally referred to as the U.S. News ranking, after U.S. News & World Report, the magazine that has created a lucrative business out of compiling all kinds of rankings—of Michigan's law program. Bill Henderson, who teaches at University of Indiana, Bloomington, warned readers of the Legal Profession Blog that "an elite law school sets a new low in our obsession of form over substance—once again, we legal educators are setting a poor example for our students." The widely followed Above the Law blog was less charitable. In a post titled "Please Stop the Insanity," the editor complained that "the 'let's pretend that the LSAT is meaningless so long as you matriculate at Michigan' game is the worst kind of cynicism." He continued: "This ploy makes Michigan Law School look far worse than any sandwich-stealing homeless person ever could."
In recent years, U.S. News has run a one-horse race when it comes to ranking law schools. By contrast, there are no fewer than six organizations reaching for the wallets of prospective MBA students, such as Businessweek, The Economist, Wall Street Journal, and U.S. News & World Report. As students, alumni, and society embrace the U.S. News rankings, law school administrators shelved their misgivings about the methodology, instead seeking ways to climb up the ladder. Jeffrey Stake, another Indiana University professor who studies law school rankings, lamented that: "The question 'Is this person going to be a good lawyer?' is being displaced by 'Is this person going to help our numbers?'" Administrators fret over meaningless, minor switches in rank from one year to the next. One dean told sociologists Michael Sauder and Wendy Espeland how the university community reacted to a one-slot slippage:
When we dropped [out of the Top 50], we weren't called fifty-first, we were suddenly in this undifferentiated alphabetized thing called the second tier. So the [local newspaper's] headline is "[School X] Law School Drops to Second Tier." My students have a huge upset: "Why are we a second-tier school? What's happened to make us a second-tier school?"
Schools quickly realized that two components of the U.S. News formula—LSAT and undergraduate GPA—dominate all else. That's why the high GPA and no LSAT prerequisites of the Wolverine Scholars Program aroused suspicion among critics. Since the American Bar Association (ABA) requires a "valid and reliable admission test" to admit first-year J.D. (Doctor of Law) students, bloggers speculated that Michigan would get around the rule by using college admission test scores. Several other law schools, including Georgetown University (U.S. News rank #14), University of Minnesota (U.S. News rank #22), and University of Illinois (U.S. News rank #27), have rolled out similar programs aimed at their own undergraduates. At Minnesota, as at Michigan, the admissions officers do not just ignore LSAT scores; they shut the door on applicants who have taken the LSAT.
1. Playing Dean for One Day
Between retaining top students and boosting the school's ranking, one can debate which is the intended beneficiary, and which is the side effect of early admission schemes. One cannot but marvel at the silky manner by which Michigan killed two birds with one stone. Even though the school's announcement focused entirely on the students, the law bloggers promptly sniffed out the policy's unspoken impact on the U.S. News ranking. This is a great demonstration of NUMBERSENSE. They looked beyond the one piece of information fed to them, spotted a hidden agenda, and sought data to investigate an alternative story.
Knowing the mechanism of different types of formulas is the start of knowing how to interpret the numbers. With this in mind, we play Admissions Dean for a day. Not any Admissions Dean but the most cynical, most craven, most calculating Dean of an elite law school. We use every trick in the book, we leave no stones unturned, and we take no prisoners. The U.S. News ranking is the elixir of life; nothing else matters to us. It's a dog-eat-dog world: If we don't, our rival will. We are going upstream, so that standing still is rolling backwards.
Over the years, U.S. News editors have unveiled the gist of their methodology for ranking law schools. The general steps, common to most ranking procedures, are as follows:
1. Break up the overall rating into component scores.
2. Rate each component, using either survey results or submitted data.
3. Convert the component scores to a common scale, say 0 to 100.
4. Determine the relative importance of each component.
5. Compute the aggregate score as the weighted sum of the scaled component scores.
6. Express the aggregate score in the desired scale. For example, the College Board uses a scale of 200 to 800 for each section of the SAT.
Rankings are by nature subjective things. Steps 1, 2, and 4 reflect opinions of the designers of such formulas. The six business school rankings are not well correlated because their creators incorporate, measure, and emphasize different factors. For example, Businessweek bases 90 percent of its ratings on reputation surveys, placing equal weights on a survey of recent graduates and a survey of corporate recruiters while the Wall Street Journal considers only one factor, evaluation by corporate recruiters. Note that the scaling in Step 3, known as standardization, is needed in order to preserve the required weights applied in Step 5.
Figure 1-1 illustrates the decisions made by U.S. News in designing their law school rating. The editors tally up 12 elements, grouped into four categories, using weights they and only they can explain. The two biggest components—assessment scores by peers, and by lawyers and judges—are obtained from surveys while the others make use of data self-reported by the schools.
From the moment the U.S. News ranking of law schools appeared in 1987, academics have mercilessly exposed its flaws and decried its arbitrary nature. Since reputations of institutions are built and sustained over decades, it seems silly to publish an annual ranking, particularly one in which schools swap seats frequently, and frequently in the absence of earth-shattering news. Using a relative scale produces the apparently illogical outcome that a school's ranking can move up or down without having done anything differently from the previous year while other schools implement changes. The design of the surveys is puzzling. Why do they expect the administrators of one school or the partners of one law firm to have panoramic vision of all 200 law schools? The rate of response for the professional survey is low, below 15 percent, and the survey sample is biased as it is derived from the Top Law Firms ranked by none other than U.S. News.
Such grumbling is valid. Yet such grumbling is pointless, and has proven futile against the potent marketing machine of U.S. News. The law school ranking, indeed any kind of subjective ranking, does not need to be correct; it just has to be believed. Even the much-maligned BCS (Bowl Championship Series) ranking of U.S. college football teams has a clearer path toward acceptance because the methodology can be validated in the postseason, when the top teams face off. The rivalry among law schools does not admit such duels, and thus, we have no means of verifying any method of ranking. There is no such thing as accuracy; the scarce commodity here is trust. The difference between the U.S. News ranking and the also-rans is the difference between branded, bottled water and tap water. In our time, we have come to adopt all types of rating products with flimsy scientific bases; we don't think twice while citing Nielsen television ratings, Michelin ratings for restaurants, Parker wine ratings, and lately, the Klout Score for online reputation.
The U.S. News ranking, if defeated, would yield to another flawed methodology, so law school deans might as well hold their noses. As the devious Admissions Dean, we want to game the system. And our first point of attack is the self- reported statistics. Paradoxically, these "objective" part-scores—such as undergraduate GPA and post-graduation employment rate—tempt manipulation more than the subjective reputation scores. That's because we are the single source of data.
2. Fakes, Cherry-Picking, and Missing-Card Tricks
The median undergraduate GPA of admitted students is a signal of a graduate school's quality, and also a key element of the U.S. News formula. The median is the mid-ranked value that splits a population in half. Michigan Law School's Class of 2013 had a median GPA of 3.73 (roughly equal to an A–), with half the class between 3.73 and 4.00, and the other half below 3.73.
The laziest way to raise the median GPA is to simply fake it. Faking is easy to do, but it is also easily exposed. The individual scores no longer tie to the aggregate statistic. To reduce the risk of detection, we inflate individual data to produce the desired median. The effort required is substantially higher, as we must fix up not just one student's score, but buckets of them. Statisticians call the median a robust statistic because it doesn't get flustered by a few extreme values.
Start with a median GPA of 3.73. If we rescinded an offer to someone with a GPA of 3.75 and gave the spot to a 4.00, the median would not budge, because the one with 3.75 already placed in the top half of the class. So substituting him or her with a 4.00 would not change the face of the median student. What if we swapped a 3.45 with a 4.00? It turns out the median would still remain unaltered. This is by design, as the U.S. News editors want to thwart cheating.
Figure 1-2 explains why the median is so level-headed. Removing the bottom block while inserting a new one at the top would shift the middle block down by one spot. The effect of swapping one student on the median is no larger than the difference between it and the value of its neighbor. This difference is truly minute at an elite program such as Michigan Law School, since the middle half of its class, about 180 students, fit into a super-tight band of 0.28 grade points, thanks to the sieve of its prestige. (For reference, the gap between B+ and A- is 0.33 grade points.)
U.S. News editors might have thought that using the median prevents us from gaming the methodology, but they can't stifle our creativity now, can they? If we swap enough students, the median value will give. Of course, meddling with individual scores is a traceable act. We prefer methods that don't leave crumbs. By obsessively monitoring the median GPA throughout the admissions season, we construct the right profile, student by student, and avoid having to retouch submitted data.
Even more attractive are schemes with built-in protection. Few will condemn us for offering merit-based scholarships to compete with our peer institutions for the brightest students. Financial aid is one of the most important criteria students use to choose between schools. So we divert funds to those applicants with GPAs just above our target. At the same time, we withhold scholarships from top-notch students who might prefer our rivals. Instead of awarding one student a full scholarship, why not offer two half-scholarships to affect two applicants?
A flaw of most ranking systems, including the U.S. News flavor, is equating a GPA of 3.62 from one school with a GPA of 3.62 from a different school, even though everyone understands each school abides by its own grading culture, teachers create different expectations, courses differ by their level of difficulty, and classmates may be more or less competitive. This flaw is there to be exploited.
We favor those schools that deliver applicants with higher grade point averages. Colleges that take the higher ground—for instance, Princeton University initiated a highbrow "grade deflation" policy in 2004—can stay there while we take the higher GPAs from their blue-collar rivals. Similarly, we like academic departments that are generous with As, and that means more English or Education majors, and fewer Engineering or Science majors. No one can criticize us for accepting students with better qualifications. Cherry-picking schools and curricula occur under the radar, and our conscience is clean since we do not erase or falsify data.
When was the last time you slipped drinks into the movieplex while the attendant was looking the other way? We play a similar trick on the data analyst. Let's hide (weaker) students. Every year, applicants impress us in many ways other than earning top GPAs. Accepting these candidates sullies our median GPA, and hurts our precious U.S. News ranking. Instead of rejecting these promising students, we send them to summer school. Their course load is thus lessened in the fall term, and they turn into "part-time" students, who are ignored by U.S. News. Alternatively, or additionally, we encourage these applicants to shape up at a second-tier law school, and reapply after the first year as transfer students, who are also ignored by U.S. News.
These tactics exploit missing values. Missing data is the blind spot of statisticians. If they are not paying full attention, they lose track of these little details. Even when they notice, many unwittingly sway things our way. Most ranking systems ignore missing values. Reporting low GPAs as "not available" is a magic trick that causes the median GPA to rise. Sometimes, the statisticians attempt to fill in the blanks. Mean imputation is a technical phrase that means replacing any missing value with the average of the available values. If we submit a below-average GPA as "unknown," and the analyst converts all blanks into the average GPA, we'd have used a hired gun, wouldn't we? (See how this trick works in Figure 1-3.) If a student suffered depression during school, or studied abroad for a semester where the foreign university does not issue grades, or took on an inhumane course load, or faced whatever other type of unusual challenges, we simply scrub the offensive GPAs, under the guise of "leveling the playing field" for all applicants. Life is unfair even for students at elite colleges; since the same students would have earned much higher GPAs if they had attended an average school, we have grounds to adjust or invalidate their grades. We tell the media that the problem isn't that the numbers drag down our median, but that they are misleading! So good riddance to bad data.
Excerpted from NUMBER SENSE by KAISER FUNG. Copyright © 2013 Kaiser Fung. Excerpted by permission of McGraw-Hill Education.
All rights reserved. No part of this excerpt may be reproduced or reprinted without permission in writing from the publisher.
Excerpts are provided by Dial-A-Book Inc. solely for the personal use of visitors to this web site.
Table of Contents
List of Figures
PART 1 SOCIAL DATA
1 Why Do Law School Deans Send Each Other Junk Mail?
2 Can a New Statistic Make Us Less Fat?
PART 2 MARKETING DATA
3 How Can Sellouts Ruin a Business?
4 Will Personalizing Deals Save Groupon?
5 Why Do Marketers Send You Mixed Messages?
PART 3 ECONOMIC DATA
6 Are They New Jobs If No One Can Apply?
7 How Much Did You Pay for the Eggs?
PART 4 SPORTING DATA
8 Are You a Better Coach or Manager?