Read an Excerpt
At about the time The Signal and the Noise was first published in September 2012, “Big Data” was on its way becoming a Big Idea. Google searches for the term doubled over the course of a year,1 as did mentions of it in the news media.2 Hundreds of books were published on the subject. If you picked up any business periodical in 2013, advertisements for Big Data were as ubiquitous as cigarettes in an episode of Mad Men.
But by late 2014, there was evidence that trend had reached its apex. The frequency with which Big Data was mentioned in corporate press releases had slowed down and possibly begun to decline.3 The technology research firm Gartner even declared that Big Data had passed the peak of its “hype cycle.”4
I hope that Gartner is right. Coming to a better understanding of data and statistics is essential to help us navigate our lives. But as with most emerging technologies, the widespread benefits to science, industry, and human welfare will come only after the hype has died down.
FIGURE P-1: BIG DATA MENTIONS IN CORPORATE PRESS RELEASES
I worry that certain events in my life have contributed to the hype cycle. On November 6, 2012, the statistical model at my Web site FiveThirtyEight “called” the winner of the American presidential election correctly in all fifty states. I received a congratulatory phone call from the White House. I was hailed as “lord and god of the algorithm” by The Daily Show’s Jon Stewart. My name briefly received more Google search traffic than the vice president of the United States.
I enjoyed some of the attention, but I felt like an outlier—even a fluke. Mostly I was getting credit for having pointed out the obvious—and most of the rest was luck.*
To be sure, it was reasonably clear by Election Day that President Obama was poised to win reelection. When voters went to the polls on election morning, FiveThirtyEight’s statistical model put his chances of winning the Electoral College at about 90 percent.* A 90 percent chance is not quite a sure thing: Would you board a plane if the pilot told you it had a 90 percent chance of landing successfully? But when there’s only reputation rather than life or limb on the line, it’s a good bet. Obama needed to win only a handful of the swing states where he was tied or ahead in the polls; Mitt Romney would have had to win almost all of them.
But getting every state right was a stroke of luck. In our Election Day forecast, Obama’s chance of winning Florida was just 50.3 percent—the outcome was as random as a coin flip. Considering other states like Virginia, Ohio, Colorado, and North Carolina, our chances of going fifty-for-fifty were only about 20 percent.5 FiveThirtyEight’s “perfect” forecast was fortuitous but contributed to the perception that statisticians are soothsayers—only using computers rather than crystal balls.
This is a wrongheaded and rather dangerous idea. American presidential elections are the exception to the rule—one of the few examples of a complex system in which outcomes are usually more certain than the conventional wisdom implies. (There are a number of reasons for this, not least that the conventional wisdom is often not very wise when it comes to politics.) Far more often, as this book will explain, we overrate our ability to predict the world around us. With some regularity, events that are said to be certain fail to come to fruition—or those that are deemed impossible turn out to occur.
If all of this is so simple, why did so many pundits get the 2012 election wrong? It wasn’t just on the fringe of the blogosphere that conservatives insisted that the polls were “skewed” toward President Obama. Thoughtful conservatives like George F. Will6 and Michael Barone7 also predicted a Romney win, sometimes by near-landslide proportions.
One part of the answer is obvious: the pundits didn’t have much incentive to make the right call. You can get invited back on television with a far worse track record than Barone’s or Will’s—provided you speak with some conviction and have a viewpoint that matches the producer’s goals.
An alternative interpretation is slightly less cynical but potentially harder to swallow: human judgment is intrinsically fallible. It’s hard for any of us (myself included) to recognize how much our relatively narrow range of experience can color our interpretation of the evidence. There’s so much information out there today that none of us can plausibly consume all of it. We’re constantly making decisions about what Web site to read, which television channel to watch, and where to focus our attention.
Having a better understanding of statistics almost certainly helps. Over the past decade, the number of people employed as statisticians in the United States has increased by 35 percent8 even as the overall job market has stagnated. But it’s a necessary rather than sufficient part of the solution. Some of the examples of failed predictions in this book concern people with exceptional intelligence and exemplary statistical training—but whose biases still got in the way.
These problems are not so simple and so this book does not promote simple answers to them. It makes some recommendations but they are philosophical as much as technical. Once we’re getting the big stuff right—coming to a better understanding of probably and uncertainty; learning to recognize our biases; appreciating the value of diversity, incentives, and experimentation—we’ll have the luxury of worrying about the finer points of technique.
Gartner’s hype cycle ultimately has a happy ending. After the peak of inflated expectations there’s a “trough of disillusionment”—what happens when people come to recognize that the new technology will still require a lot of hard work.
FIGURE P-2: GARTNER’S HYPE CYCLE
But right when views of the new technology have begun to lapse from healthy skepticism into overt cynicism, that technology can begin to pay some dividends. (We’ve been through this before: after the computer boom in the 1970s and the Internet commerce boom of the late 1990s, among other examples.) Eventually it matures to the point when there are fewer glossy advertisements but more gains in productivity—it may even have become so commonplace that we take it for granted. I hope this book can accelerate the process, however slightly.
This is a book about information, technology, and scientific progress. This is a book about competition, free markets, and the evolution of ideas. This is a book about the things that make us smarter than any computer, and a book about human error. This is a book about how we learn, one step at a time, to come to knowledge of the objective world, and why we sometimes take a step back.
This is a book about prediction, which sits at the intersection of all these things. It is a study of why some predictions succeed and why some fail. My hope is that we might gain a little more insight into planning our futures and become a little less likely to repeat our mistakes.
More Information, More Problems
The original revolution in information technology came not with the microchip, but with the printing press. Johannes Gutenberg’s invention in 1440 made information available to the masses, and the explosion of ideas it produced had unintended consequences and unpredictable effects. It was a spark for the Industrial Revolution in 1775,1 a tipping point in which civilization suddenly went from having made almost no scientific or economic progress for most of its existence to the exponential rates of growth and change that are familiar to us today. It set in motion the events that would produce the European Enlightenment and the founding of the American Republic.
But the printing press would first produce something else: hundreds of years of holy war. As mankind came to believe it could predict its fate and choose its destiny, the bloodiest epoch in human history followed.2
Books had existed prior to Gutenberg, but they were not widely written and they were not widely read. Instead, they were luxury items for the nobility, produced one copy at a time by scribes.3 The going rate for reproducing a single manuscript was about one florin (a gold coin worth about $200 in today’s dollars) per five pages,4 so a book like the one you’re reading now would cost around $20,000. It would probably also come with a litany of transcription errors, since it would be a copy of a copy of a copy, the mistakes having multiplied and mutated through each generation.
This made the accumulation of knowledge extremely difficult. It required heroic effort to prevent the volume of recorded knowledge from actually decreasing, since the books might decay faster than they could be reproduced. Various editions of the Bible survived, along with a small number of canonical texts, like from Plato and Aristotle. But an untold amount of wisdom was lost to the ages,5 and there was little incentive to record more of it to the page.
The pursuit of knowledge seemed inherently futile, if not altogether vain. If today we feel a sense of impermanence because things are changing so rapidly, impermanence was a far more literal concern for the generations before us. There was “nothing new under the sun,” as the beautiful Bible verses in Ecclesiastes put it—not so much because everything had been discovered but because everything would be forgotten.6
The printing press changed that, and did so permanently and profoundly. Almost overnight, the cost of producing a book decreased by about three hundred times,7 so a book that might have cost $20,000 in today’s dollars instead cost $70. Printing presses spread very rapidly throughout Europe; from Gutenberg’s Germany to Rome, Seville, Paris, and Basel by 1470, and then to almost all other major European cities within another ten years.8 The number of books being produced grew exponentially, increasing by about thirty times in the first century after the printing press was invented.9 The store of human knowledge had begun to accumulate, and rapidly.
FIGURE I-1: EUROPEAN BOOK PRODUCTION
As was the case during the early days of the World Wide Web, however, the quality of the information was highly varied. While the printing press paid almost immediate dividends in the production of higher quality maps,10 the bestseller list soon came to be dominated by heretical religious texts and pseudoscientific ones.11 Errors could now be mass-produced, like in the so-called Wicked Bible, which committed the most unfortunate typo in history to the page: thou shalt commit adultery.12 Meanwhile, exposure to so many new ideas was producing mass confusion. The amount of information was increasing much more rapidly than our understanding of what to do with it, or our ability to differentiate the useful information from the mistruths.13 Paradoxically, the result of having so much more shared knowledge was increasing isolation along national and religious lines. The instinctual shortcut that we take when we have “too much information” is to engage with it selectively, picking out the parts we like and ignoring the remainder, making allies with those who have made the same choices and enemies of the rest.
The most enthusiastic early customers of the printing press were those who used it to evangelize. Martin Luther’s Ninety-five Theses were not that radical; similar sentiments had been debated many times over. What was revolutionary, as Elizabeth Eisenstein writes, is that Luther’s theses “did not stay tacked to the church door.”14 Instead, they were reproduced at least three hundred thousand times by Gutenberg’s printing press15—a runaway hit even by modern standards.
The schism that Luther’s Protestant Reformation produced soon plunged Europe into war. From 1524 to 1648, there was the German Peasants’ War, the Schmalkaldic War, the Eighty Years’ War, the Thirty Years’ War, the French Wars of Religion, the Irish Confederate Wars, the Scottish Civil War, and the English Civil War—many of them raging simultaneously. This is not to neglect the Spanish Inquisition, which began in 1480, or the War of the Holy League from 1508 to 1516, although those had less to do with the spread of Protestantism. The Thirty Years’ War alone killed one-third of Germany’s population,16 and the seventeenth century was possibly the bloodiest ever, with the early twentieth staking the main rival claim.17
But somehow in the midst of this, the printing press was starting to produce scientific and literary progress. Galileo was sharing his (censored) ideas, and Shakespeare was producing his plays.
Shakespeare’s plays often turn on the idea of fate, as much drama does. What makes them so tragic is the gap between what his characters might like to accomplish and what fate provides to them. The idea of controlling one’s fate seemed to have become part of the human consciousness by Shakespeare’s time—but not yet the competencies to achieve that end. Instead, those who tested fate usually wound up dead.18
These themes are explored most vividly in The Tragedy of Julius Caesar. Throughout the first half of the play Caesar receives all sorts of apparent warning signs—what he calls predictions19 (“beware the ides of March”)—that his coronation could turn into a slaughter. Caesar of course ignores these signs, quite proudly insisting that they point to someone else’s death—or otherwise reading the evidence selectively. Then Caesar is assassinated.
“[But] men may construe things after their fashion / Clean from the purpose of the things themselves,” Shakespeare warns us through the voice of Cicero—good advice for anyone seeking to pluck through their newfound wealth of information. It was hard to tell the signal from the noise. The story the data tells us is often the one we’d like to hear, and we usually make sure that it has a happy ending.
And yet if The Tragedy of Julius Caesar turned on an ancient idea of prediction—associating it with fatalism, fortune-telling, and superstition—it also introduced a more modern and altogether more radical idea: that we might interpret these signs so as to gain an advantage from them. “Men at some time are masters of their fates,” says Cassius, hoping to persuade Brutus to partake in the conspiracy against Caesar.
The idea of man as master of his fate was gaining currency. The words predict and forecast are largely used interchangeably today, but in Shakespeare’s time, they meant different things. A prediction was what the soothsayer told you; a forecast was something more like Cassius’s idea.
The term forecast came from English’s Germanic roots,20 unlike predict, which is from Latin.21 Forecasting reflected the new Protestant worldliness rather than the otherworldliness of the Holy Roman Empire. Making a forecast typically implied planning under conditions of uncertainty. It suggested having prudence, wisdom, and industriousness, more like the way we now use the word foresight. 22
The theological implications of this idea are complicated.23 But they were less so for those hoping to make a gainful existence in the terrestrial world. These qualities were strongly associated with the Protestant work ethic, which Max Weber saw as bringing about capitalism and the Industrial Revolution.24 This notion of forecasting was very much tied in to the notion of progress. All that information in all those books ought to have helped us to plan our lives and profitably predict the world’s course.
• • •
The Protestants who ushered in centuries of holy war were learning how to use their accumulated knowledge to change society. The Industrial Revolution largely began in Protestant countries and largely in those with a free press, where both religious and scientific ideas could flow without fear of censorship.25
The importance of the Industrial Revolution is hard to overstate. Throughout essentially all of human history, economic growth had proceeded at a rate of perhaps 0.1 percent per year, enough to allow for a very gradual increase in population, but not any growth in per capita living standards.26 And then, suddenly, there was progress when there had been none. Economic growth began to zoom upward much faster than the growth rate of the population, as it has continued to do through to the present day, the occasional global financial meltdown notwithstanding.27