Super Crunchers: How Thinking by Numbers Is the New Way to Be Smart

Super Crunchers: How Thinking by Numbers Is the New Way to Be Smart

4.0 18
by Ian Ayres

View All Available Formats & Editions

Why would a casino try and stop you from losing? How can a mathematical formula find your future spouse? Would you know if a statistical analysis blackballed you from a job you wanted?

Today, number crunching affects your life in ways you might never imagine. In this lively and groundbreaking new book, economist Ian Ayres shows how today's best and brightest…  See more details below


Why would a casino try and stop you from losing? How can a mathematical formula find your future spouse? Would you know if a statistical analysis blackballed you from a job you wanted?

Today, number crunching affects your life in ways you might never imagine. In this lively and groundbreaking new book, economist Ian Ayres shows how today's best and brightest organizations are analyzing massive databases at lightening speed to provide greater insights into human behavior. They are the Super Crunchers. From internet sites like Google and Amazon that know your tastes better than you do, to a physician's diagnosis and your child's education, to boardrooms and government agencies, this new breed of decision makers are calling the shots. And they are delivering staggeringly accurate results. How can a football coach evaluate a player without ever seeing him play? Want to know whether the price of an airline ticket will go up or down before you buy? How can a formula outpredict wine experts in determining the best vintages? Super crunchers have the answers. In this brave new world of equation versus expertise, Ayres shows us the benefits and risks, who loses and who wins, and how super crunching can be used to help, not manipulate us.

Gone are the days of solely relying on intuition to make decisions. No businessperson, consumer, or student who wants to stay ahead of the curve should make another keystroke without reading Super Crunchers.

From the Hardcover edition.

Read More

Editorial Reviews

Psychology and sociology have had their day; super crunching is the trend of the future. Yale Law School econometrician Ian Ayres argues that for those in the know, large-scale data analysis and hypothesis testing are already dominant. He proves his point with examples that cut across categories: travel pricing, medical diagnostics, online dating services, screenwriting, baseball, civil liability, gambling, Internet auctions, hiring practices. Super Crunchers not only offers you a sneak peak into these cutting-edge technologies; it provides a wide sampling of what they are teaching us. A thoroughly fascinating statistics tutorial.

Product Details

Random House Publishing Group
Publication date:
Sold by:
Random House
Sales rank:
File size:
635 KB

Related Subjects

Read an Excerpt

Chapter One

Who's Doing Your Thinking for You?

Recommendations make life a lot easier. Want to know what movie to rent? The traditional way was to ask a friend or to see whether reviewers gave it a thumbs-up.

Nowadays people are looking for Internet guidance drawn from the behavior of the masses. Some of these "preference engines" are simple lists of what's most popular. The New York Times lists the "most emailed articles." iTunes lists the top downloaded songs. lists the most popular Internet bookmarks. These simple filters often let surfers zero in on the greatest hits.

Some recommendation software goes a step further and tries to tell you what people like you enjoyed. tells you that people who bought The Da Vinci Code also bought Holy Blood, Holy Grail. Netflix gives you recommendations that are contingent on the movies that you yourself have recommended in the past. This is truly "collaborative filtering," because your ratings of movies help Netflix make better recommendations to others and their ratings help Netflix make better recommendations to you. The Internet is a perfect vehicle for this service because it's really cheap for an Internet retailer to keep track of customer behavior and to automatically aggregate, analyze, and display this information for subsequent customers.

Of course, these algorithms aren't perfect. A bachelor buying a one-time gift for a baby could, for example, trigger the program into recommending more baby products in the future. Wal-Mart had to apologize when people who searched for Martin Luther King: I Have a Dream were told they might also appreciate a Planet of the Apes DVD collection. similarly offended some customers who searched for "abortion" and were asked "Did you mean adoption?" The adoption question was generated automatically simply because many past customers who searched for abortion had also searched for adoption.

Still, on net, collaborative filters have been a huge boon for both consumers and retailers. At Netflix, nearly two-thirds of the rented films are recommended by the site. And recommended films are rated half a star higher (on Netflix's five-star ranking system) than films that people rent outside the recommendation system.

While lists of most-emailed articles and best-sellers tend to concentrate usage, the great thing about the more personally tailored recommendations is that they diversify usage. Netflix can recommend different movies to different people. As a result, more than 90 percent of the titles in its 50,000-movie catalog are rented at least monthly. Collaborative filters let sellers access what Chris Anderson calls the "long tail" of the preference distribution. The Netflix recommendations let its customers put themselves in rarefied market niches that used to be hard to find.

The same thing is happening with music. At, users can type in a song or an artist that they like and almost instantaneously the website starts streaming song after song in the same genre. Do you like Cyndi Lauper and Smash Mouth? Voila, Pandora creates a Lauper/Smash Mouth radio station just for you that plays these artists plus others that sound like them. As each song is playing, you have the option of teaching the software more about what you like by clicking "I really like this song" or "Don't play this type of song again."

It's amazing how well this site works for both me and my kids. It not only plays music that each of us enjoys, but it also finds music that we like by groups we've never heard of. For example, because I told Pandora that I like Bruce Springsteen, it created a radio station that started playing the Boss and other well-known artists, but after a few songs it had me grooving to "Now" by Keaton Simons (and because of on-hand quick links, it's easy to buy the song or album on iTunes or Amazon). This is the long tail in action because there's no way a nerd like me would have come across this guy on my own. A similar preference system lets play more than 90 percent of its catalog of a million songs every month. has recently added its own "recommended stories" feature. It uses a cookie to keep track of the sixteen articles you've most recently read and uses automated text analysis to predict what new stories you'll want to read. It's surprising how accurate a sixteen-story history can be in kickstarting your morning reading. It's also a bit embarrassing: in my case American Idol articles are automatically recommended.

Still, Chicago law professor Cass Sunstein worries that there's a social cost to exploiting the long tail. The more successful these personalized filters are, the more we as a citizenry are deprived of a common experience. Nicholas Negroponte, MIT professor and guru of media technology, sees in these "personalized news" features the emergence of the "Daily Me"—news publications that expose citizens only to information that fits with their narrowly preconceived preferences. Of course, self-filtering of the news has been with us for a long time. Vice President Cheney only watches Fox News. Ralph Nader reads Mother Jones. The difference is that now technology is creating listener censorship that is diabolically more powerful. Websites like and started to allow users to produce "the newspaper of me" and "a personalized newscast." The goal is to create a place "where you decide what's the news." Google News allows you to personalize your newsgroups. Email alerts and RSS feeds allow you now to select "This Is the News I Want." If we want, we can now be relieved of the hassle of even glancing at those pesky news articles about social issues that we'd rather ignore.

All of these collaborative filters are examples of what James Surowiecki called "The Wisdom of Crowds." In some contexts, collective predictions are more accurate than the best estimate that any member of the group could achieve. For example, imagine that you offer a $100 prize to a college class for the student with the best estimate of the number of pennies in a jar. The wisdom of the group can be found simply by calculating their average estimate. It's been shown repeatedly that this average estimate is very likely to be closer to the truth than any of the individual estimates. Some people guess too high, and others too low—but collectively the high and low estimates tend to cancel out. Groups can often make better predictions than individuals.

On the TV show Who Wants to Be a Millionaire, "asking the audience" produces the right answer more than 90 percent of the time (while phoning an individual friend produces the right answer less than two-thirds of the time). Collaborative filtering is a kind of tailored audience polling. People who are like you can make pretty accurate guesses about what types of music or movies you'll like. Preference databases are powerful ways to improve personal decision making.

eHarmony Sings a New Tune

There is a new wave of prediction that utilizes the wisdom of crowds in a way that goes beyond conscious preferences. The rise of eHarmony is the discovery of a new wisdom of crowds through Super Crunching. Unlike traditional dating services that solicit and match people based on their conscious and articulated preferences, eHarmony tries to find out what kind of person you are and then matches you with others who the data say are most compatible. eHarmony looks at a large database of information to see what types of personalities actually are happy together as couples.
Neil Clark Warren, eHarmony's founder and driving force, studied more than 5,000 married people in the late 1990s. Warren patented a predictive statistical model of compatibility based on twenty-nine different variables related to a person's emotional temperament, social style, cognitive mode, and relationship skills.

eHarmony's approach relies on the mother of Super Crunching techniques—the regression. A regression is a statistical procedure that takes raw historical data and estimates how various causal factors influence a single variable of interest. In eHarmony's case the variable of interest is how compatible a couple is likely to be. And the causal factors are twenty-nine emotional, social, and cognitive attributes of each person in the couple.

The regression technique was developed more than 100 years ago by Francis Galton, a cousin of Charles Darwin. Galton estimated the first regression line way back in 1877. Remember Orley Ashenfelter's simple equation to predict the quality of wine? That equation came from a regression. Galton's very first regression was also agricultural. He estimated a formula to predict the size of sweet pea seeds based on the size of their parent seeds. Galton found that the offspring of large seeds tended to be larger than the offspring of average or small seeds, but they weren't quite as large as their large parents.

Galton calculated a different regression equation and found a similar tendency for the heights of sons and fathers. The sons of tall fathers were taller than average but not quite as tall as their fathers. In terms of the regression equation, this means that the formula predicting a son's height will multiply the father's height by some factor less than one. In fact, Galton estimated that every additional inch that a father was above average only contributed two-thirds of an inch to the son's predicted height.

He found the pattern again when he calculated the regression equation estimating the relationship between the IQ of parents and children. The children of smart parents were smarter than the average person but not as smart as their folks. The very term "regression" doesn't have anything to do with the technique itself. Dalton just called the technique a regression because the first things that he happened to estimate displayed this tendency—what Galton called "regression toward mediocrity"—and what we now call "regression toward the mean."

The regression literally produces an equation that best fits the data. Even though the regression equation is estimated using historical data, the equation can be used to predict what will happen in the future. Dalton's first equation predicted seed and child size as a function of their progenitors' size. Orley Ashenfelter's wine equation predicted how temperature and rain would impact wine quality.

eHarmony produced a formula to predict preference. Unlike the Netflix or Amazon preference engines, the eHarmony regression is trying to match compatible people by using personality and character traits that people may not even know they have or be able to articulate. Indeed, eHarmony might match you with someone that you might never have imagined that you could like. This is the wisdom of crowds that goes beyond the conscious choices of individual members to see what works at unconscious, hidden levels.

eHarmony is not alone in trying to use data-driven matching. Perfectmatch matches users based on a modified version of the Myers-Briggs personality test. In the 1940s, Isabel Briggs Myers and her mother Katharine Briggs developed a test based on psychiatrist Carl Jung's theory of personality types. The Myers-Briggs test classifies people into sixteen different basic types. Perfectmatch uses this M-B classification to pair people who have personalities that historically have the highest probability of forming lasting relationships.

Not to be outdone, collects data from its clients on ninety-nine relationship factors and feeds the results into a regression formula to calculate the compatibility index score between any two members. In essence, will tell you the likelihood you will get along with anyone else.

While all three services crunch numbers to make their compatibility predictions, their results are markedly different. eHarmony believes in finding people who are a lot like you. "What our research kept saying," Warren has observed, "is [to] find somebody whose intelligence is a lot like yours, whose ambition is a lot like yours, whose energy is a lot like yours, whose spirituality is a lot like yours, whose curiosity is a lot like yours. It was a similarity model."

Perfectmatch and in contrast look for complementary personalities. "We all know, not just in our heart of hearts, but in our experience, that sometimes we're attracted [to], indeed get along better with,  somebody different from us," says Pepper Schwartz, the empiricist behind Perfectmatch. "So the nice thing about the Myers-Briggs was it's not just characteristics, but how they fit together."

This disagreement over results isn't the way data-driven decision making is supposed to work. The data should be able to adjudicate whether similar or complementary people make better matches. It's hard to tell who's right, because the industry keeps its analysis and the data on which the analysis is based a tightly held secret. Unlike the data from a bunch of my studies (on taxicab tipping, affirmative action, and concealed handguns) that anyone can freely download from the Internet, the data behind the matching rules at the Internet dating services are proprietary.

Mark Thompson, who developed Yahoo! Personals, says it's impractical to apply social science standards to the market. "The peer-review system is not going to apply here," Thompson says. "We had two months to develop the system for Yahoo! We literally worked around the clock. We did studies on 50,000 people."

The matching sites, meanwhile, are starting to compete on validating their claims. emphasizes that it is the only site which had its methodology certified by an independent auditor.'s chief psychologist James Houran is particularly dismissive of eHarmony's data claims. "I've seen no evidence they even conducted any study that forms the basis of their test," Houran says. "If you're touting that you're doing something scientific . . . you inform the academic community."

eHarmony is responding by providing some evidence that their matching system works. It sponsored a Harris poll suggesting that eHarmony is now producing about ninety marriages a day (that's over 30,000 a year). This is better than nothing, but it's only a modest success because with more than five million members, these marriages represent about only a 1 percent chance that your $50 fee will produce a walk down the aisle. The competitors are quick to dismiss the marriage number. Yahoo!'s Thompson has said you have a better chance of finding your future spouse if you "go hang out at the Safeway."

eHarmony also claims that it has evidence that its married couples are in fact more compatible. Its researchers presented last year to the American Psychological Society their finding that married couples who found each other through eHarmony were significantly happier than couples married for a similar length of time who met by other means. There are some serious weaknesses with this study, but the big news for me is that the major matching sites are not just Super Crunching to develop their algorithms; they're Super Crunching to prove that their algorithms got it right.

The matching algorithms of these services aren't, however, completely data-driven. All the services rely at least partially on the conscious preferences of their clients (regardless of whether these preferences are valid predictors of compatibility). eHarmony allows clients to discriminate on the race of potential mates. Even though it's only acting on the wishes of its clients, matching services that discriminate by race may violate a statute dating back to the Civil War that prohibits race discrimination in contracting. Think about it. eHarmony is a for-profit company that takes $50 from black clients and refuses to treat them the same (match them with the same people) as some white clients. A restaurant would be in a lot of trouble if it refused to seat Hispanic customers in a section where customers had stated a preference to have "Anglos only."

eHarmony has gotten into even more trouble for its refusal to match same sex couples. The founder's wife and senior vice president, Marylyn Warren, has claimed that "eHarmony is meant for everybody. We do not discriminate in any way." This is clearly false. They would refuse to match two men even if, based on their answers to the company's 436 questions, the computer algorithm picked them to be the most compatible. There's a sad irony here. eHarmony, unlike its competitors, insists that similar people are the best matches. When it comes to gender, it insists that opposites attract. Out of the top ten matching sites, eHarmony is the only one that doesn't offer same-sex matching.

Why is eHarmony so out of step? Its refusal to match gay and lesbian clients, even in Massachusetts where same-sex marriage is legal, seems counter to the company's professed goal of helping people find lasting and satisfying marriage partners. Warren is a self-described "passionate Christian" who for years worked closely with James Dobson's Focus on the Family. eHarmony is only willing to facilitate certain types of legal marriages regardless of what the statistical algorithm says. In fact, because the algorithm is not public, it is possible that eHarmony puts a normative finger on the scale to favor certain clients.

From the Hardcover edition.

Read More

Meet the Author

Ian Ayres ,an econometrician and lawyer, is the William K. Townsend Professor at Yale Law School, and a professor at Yale's School of Management. He is a regular commentator on public radio's Marketplace and a columnist for Forbes magazine. He is currently the editor of the Journal of Law, Economics and Organization, and has written eight books and more than a hundred articles.

From the Hardcover edition.

Customer Reviews

Average Review:

Write a Review

and post it to your social network


Most Helpful Customer Reviews

See all customer reviews >

Super Crunchers: How Thinking by Numbers Is the New Way to Be Smart 3.9 out of 5 based on 0 ratings. 15 reviews.
Guest More than 1 year ago
Just look at the first chapter, where Ian Ayres touts his research on lojack devices, There is no discussion why almost all the insurance companies oppose giving any discount on the devices being installed. Presumably there are too few purchases of the device because if I hide a lojack on my car, even those without lojack benefit because car theives can't tell if a car is protected before they take it. Even with free-riding problems, if people got their cars back in generally one piece, why shouldn't the insurance companies want to give some discount? If there is a free-riding problem, it could be solved by car companies putting the device on all their cars. For example, if Porche put lojacks on its cars, Porche is protected without any beneficial spillover for others. Yet, no one 'not Porche, BMW, Cadillac, etc.' follows this policy. Couldn't Ayres discuss these problems? Couldn't he even mention them? What about the empirical work that confirms these car or insurance companies might not be as stupid as Ayres claims that they are? If he has a response, why not even mention these problems? He touts research reportedly showing that more abortion reduces crime, but he fails to note that if one actually did what the authors said should be done to conduct the tests, the effect went away 'see 'Abortion, crime and econometrics.' Economist Magazine, December 1, 2005'. Again, why not mention these problems? Other parts of the book also have problems. Ayres' empirical work on discrimination has also been extensively criticized, but no one would ever know from his discussion about these problems. The book would have given readers a better feel for what empirical work entails if instead of just making accertions about findings 'even when those findings have been proven to be wrong', he had spent even a little time showing how people learn from these debates over his research. A book touting the importance of empirical work would gain some credibility if Ayres acknowledged the objections raised to his and his friend's work and explained why their results still held. The personal attacks that Ayres makes in the book are also filled with inaccuracies
Anonymous More than 1 year ago
When he/she is revealed to be the new companion i call dibs!
Anonymous More than 1 year ago
Anonymous More than 1 year ago
Is she?
Anonymous More than 1 year ago
The second Doctor Whooves turns around, the Angel Statue moves....
Anonymous More than 1 year ago
Yes. Then my boyfriend was erased by the crack in my wall... and now I'm not really.
Anonymous More than 1 year ago
Seghetto More than 1 year ago
This was a quick read, but it was a fun read. Professor Ayres is a great writer (despite the hubbub with the plagiarism). I have a background in economics so I have a bit better than average grasp of the subject that he explored. The applications of data mining  are many, yet we still have to make decisions for ourselves and advertising will never be perfect. Data mining is fetishized now and people  expect big data to solve all our problems. This book doesn't take much of a critical approach, more of a hey look this is cool approach.
Anonymous More than 1 year ago
Anonymous More than 1 year ago
Anonymous More than 1 year ago
Anonymous More than 1 year ago
Anonymous More than 1 year ago
dhweinflash More than 1 year ago
This book takes off the the other left off.
Guest More than 1 year ago
This book chronicles how data-driven decision making is changing marketing, sports, government policy, entertainment and other industries. This will influence how we purchase products ,services, set policy, and affect critical functions of decison makers. It is written in laymans terms and is an eye opener for everyone. I finished it in 3 days because it was so compelling