What if there was an algorithm that could predict which novels become mega-bestsellers? Are books like Dan Brown’s The Da Vinci Code and Gillian Flynn’s Gone Girl the Gladwellian outliers of publishing? The Bestseller Code boldly claims that the New York Times bestsellers in fiction are predictable and that it’s possible to know with 97% certainty if a manuscript is likely to hit number one on the list as opposed to numbers two through fifteen.
The algorithm does exist; the code has been cracked; the results are in; and they are stunning. The system analyzes themes, plot, character, pacing, even the frequency of words and punctuation, to predict which stories will resonate with readers. A 28-year-old heroine is a big plus. So is realism. Giving 30% of your novel to only two specific topics. And if you can include a dog rather than a cat and few sex scenes, you have a better chance of writing a bestselling novel.
The project is an investigation into our intellectual and emotional responses as humans and readers to books of all genres. It is a big idea book that will appeal to fans of The Black Swan by Nassim Taleb, a book for data-mining nerds, as well as a book about writing, reading, and publishing. Anyone who has ever wondered whyGone Girl, Girl on the Train or The Girl With the Dragon Tattoo captured so many readers worldwide will find their interest piqued.
|Publisher:||St. Martin's Press|
|Product dimensions:||5.80(w) x 8.40(h) x 1.10(d)|
About the Author
Jodie Archer bought and edited books for Penguin UK before she decamped for the doctoral program in English at Stanford University. After her PhD, she worked at Apple as their research lead on literature. She is now a full time writer.
Matthew L. Jockers was the co-founder of Stanford University’s Literary Lab in Silicon Valley. His digital humanities work has been profiled in the New York Times, The LA Review of Books, and more. He is Associate Professor of English at the University of Nebraska in Lincoln.
Read an Excerpt
The Bestseller Code
Anatomy of the Blockbuster Novel
By Jodie Archer, Matthew L. Jockers
St. Martin's PressCopyright © 2016 Jodie Archer and Matthew L. Jockers
All rights reserved.
THE BESTSELLER-OMETER, OR, HOW TEXT MINING MIGHT CHANGE PUBLISHING
Back in the spring of 2010, Stieg Larsson's agent was having a good day. On June 13, The Girl Who Kicked the Hornets' Nest — third in the series from a previously unknown author — debuted at number one in hardback in the New York Times. You can imagine the lists would have been a pleasing sight over morning coffee. Hornets' Nest straight in at the top, Dragon Tattoo at number one in two paperback formats, and The Girl Who Played with Fire a roundly satisfying number two. This had been going on for forty-nine weeks in the U.S., and for three solid years in Europe. It would have been hard not to be smug.
The following month Amazon would announce Larsson was the first author ever to sell a million copies on the Kindle, and over the next two years sales in all editions would top seventy-five million. Not bad for an unknown political activist–turned-novelist from a little Scandinavian country, especially one who had chosen a rather uncharming title in Swedish and had written some brutal scenes of rape and torture. Men Who Hate Women — or The Girl with the Dragon Tattoo as it was renamed in English — was the sensation book of the year in more than thirty countries.
The press didn't understand the success. Major newspapers commissioned opinion pieces on what on earth was going on in the book world. Why this book? Why the frenzy? What was the secret? Who could have known?
Answers were lackluster. Reviewers scratched their heads about it. They found fault with the novel's structure, style, plotting, and character. They groaned over the translations. They complained about the stupidity of the reading public. But still copies sold as fast as they were printed — whether you were in the UK, the U.S., in Japan, or in Germany; whether you were male, female, old, young, black, white, straight, or gay. Whoever you were, practically anywhere, you knew people who were reading those books.
That doesn't happen very often in the book world. The industry might enjoy a phenomenon breakout like Larsson once a year, if that. E. L. James has been the biggest breakout since, with Fifty Shades of Grey, and unlike Larsson she was available for a big publicity tour. Larsson had died before publication. The level of sales his trilogy achieved without even the backing of its author was supposedly just unfathomable. Freakish. Unpredictable.
Let's consider some numbers. A company in Delaware called Bowker is the global leader in bibliographic information and the exclusive provider for unique identification numbers (ISBN) for books in the U.S. Their annual report states that approximately fifty to fifty-five thousand new works of fiction are published every year. Given the increasing number of self-published ebooks that carry no ISBN, this is a conservative number. In the U.S., about two hundred to two hundred twenty novels make the New York Times bestseller lists every year. Even with conservative numbers, that's less than half a percent of works of fiction published. Of that half a percent, even fewer hit the bestseller lists and stay there week after week to become what the industry calls a "double-digit" book. Only handfuls of authors manage those ten or more weeks on the list, and of those maybe just three or four will sell a million copies of a single title in the U.S. in one year. Why those books?
Traditionally, it is believed that there are certain skills a novelist needs to master in order to win readers: a sense of plot, compelling characters, more than basic competence with grammar. Writers with big fan bases have mastered more: an eye for the human condition, the twists and turns of plausibility, that rare but appropriate use of the semicolon. These are good writers, and with time and dedication almost all genuinely good writers will find their audience. But when it comes to the kind of success involved in hundreds of thousands of people reading the same book at the same time — this thriller and not that thriller, this potential Pulitzer and not that potential Pulitzer — well, unless Oprah is involved, that signals the presence of a fine stardust that's apparently just too difficult to detect. The sudden and seemingly blessed success of books like the Dragon Tattoo Trilogy, Fifty Shades of Grey, The Help, Gone Girl, and The Da Vinci Code is considered very lucky, but as random as winning the lottery.
The word "bestseller," by the way, has always been a book world term, and as a word it is relatively young. It first entered the dictionary in the late nineteenth century, about the time of the first list of books ranked by consumer sales. While it should be a neutral term, it has developed some connotations that are likely misleading. The literary magazine The Bookman started to print "Sales of Books during the Month" in 1891 in London and in 1895 in New York after the International Copyright Act of 1891 slowed down the distribution of cheap pirated copies of British novels. Until then, no sales statistics had really been possible. From the beginning, the lists — which were printed in each major city and typically reported the top six sellers of the month — were about two things that were new to the book world. The bestseller lists were about sales as the only criterion for inclusion, and a proxy recommendation system for what to read next. These recommendations were based not on the choices of a select few reviewers or publishers, but on the choices of everyday fellow readers. The reader's choice was and still is the only vote. The term "bestseller," then, should carry no intrinsic comment on quality or type of book, and is not a synonym for either "genre" or "popular fiction." While the word has often been used pejoratively by some members of the literary establishment, who have felt that the collective taste of the reading market signals bad literature, the data itself suggests a less subjective and more balanced truth. Bestsellers include Pulitzer Prize winners and Great American Novels as well as books by famous mass-market writers. The list can house Toni Morrison and Margaret Atwood alongside Michael Connelly and Debbie Macomber. This is why the bestseller list is such a rich cultural construct and so dynamic to study.
Obviously there's a lot of value in writing one of those books. There's a lot of value in finding those books as an agent or editor. There's a lot of value for retailers, too — the top few titles alone are why some retailers are able to stay in business and keep selling books at all.
Of course, we are talking for now of value in monetary terms. Imagine a seven- or even eight-figure advance for finally getting onto the page that book you are always telling your friends is inside you. Not many authors command that kind of clout in one territory, but they are certainly around. And you can glamorize the impoverished artist with his pen and notebook as much as you like, but wouldn't it be nice to think of the story you just made up as appearing on bedside tables, beside bathtubs, and on commuter iPads and Kindles in different languages all over the world?
The key sellers of a given year bring the glamor and the drama. They represent the houses in the Hamptons, the fancy cars and diamond tiaras of the literary domain. Hit the lists and stay there for a while and you will be revered, respected, loathed, and condemned. You might be asked to judge a prize or review other books. Maybe your movie rights will be optioned. People will be talking.
Wouldn't it be fun if success weren't so random?
The bold claim of this book is that the novels that hit the New York Times bestseller lists are not random, and the market is not in fact as unknowable as others suggest. Regardless of genre, bestsellers share an uncanny number of latent features that give us new insights into what we read and why. What's more, algorithms allow us to discover new and even as yet unpublished books with similar hallmarks of bestselling DNA.
There is a commonly repeated "truth" in publishing that success is all about an established name, marketing dollars, or expensive publicity campaigns. Sure, these thing have an impact, but our research challenges the idea it's all about hype in a way that should appeal to those writers who toil over their craft. Five years of study suggests that bestselling is largely dependant upon having just the right words in just the right order, and the most interesting story about the NYT list is about nothing more or less than the author's manuscript, black ink on white paper, unadorned.
Using a computer model that can read, recognize, and sift through thousands of features in thousands of books, we discovered that there are fascinating patterns inherent to the books that are most likely to succeed in the market, and they have their own story to tell about readers and reading. In this book we will describe how and why we built such a model and how it discovered that eighty to ninety percent of the time the bestsellers in our research corpus were easy to spot. Eighty percent of New York Times bestsellers of the past thirty years were identified by our machines as likely to chart. What's more, every book was treated as if it were a fresh, unseen manuscript and then marked not just with a binary classification of "likely to chart" or "likely not to," but also with a score indicating its likelihood of being a bestseller. These scores are fascinating in their own right, but as we show how they are made we will also share our explanation for why that book on your bedside table is so hard to put down.
Consider some of these percentages. The computer model's certainty about the success of Dan Brown's latest novel, Inferno, was 95.7 percent. For Michael Connelly's The Lincoln Lawyer it was 99.2 percent. Both were number one in hardback on the NYT list, which for a long time has been one of the most prestigious positions to occupy in the book world. These are veteran authors, of course, already established. But the model is unaware of an author's name and reputation and can just as confidently score an unknown writer. The score for The Friday Night Knitting Club, the first novel by Kate Jacobs, was 98.9 percent. The Luckiest Girl Alive, a very different debut novel by Jessica Knoll, had a bestselling success score of 99.9 percent based purely on the text of the manuscript. Both Jacobs and Knoll stayed on the list for many weeks. The Martian (before Matt Damon's interest in playing the protagonist) got 93.4 percent. There are examples from all genres: The First Phone Call from Heaven, a spiritual tale by Mitch Albom, 99.2 percent; The Art of Fielding, a literary debut by Chad Harbach, 93.3 percent; and Bared to You, an erotic romance by Sylvia Day, 91.2 percent.
These figures, which provide a measure of bestselling potential, have made some people excited, others angry, and more than a few suspicious. In some ways that is fair enough: the scores are disruptive, mind-bending. To some industry veterans, they are absurd. But they also could just change publishing, and they will most certainly change the way that you think about what's inside the next bestseller you read.
We should make it clear that none of the books we reference were acquired based on our model's figures, and figures, beyond the ones you'll read about here, have never been formally shared with any agent or publishing house. We should also be clear that these figures are specific to the closed world of our research corpus, a corpus we designed to look like what you'd see if you walked into a Barnes & Noble with a wide selection to choose from. Agents and editors do a good job of putting books in front of consumers — it's not as though we are short of things to read. And some individuals in publishing have a particular reputation for the Midas touch. But remember that the bestseller rate in the industry as it stands is less than one-half of one percent. That's a lot of gambling before a big win. Note, too, that year after year, the lists comprise the names of the same long-standing mega-authors. Stephen King is sixty-eight. James Patterson is sixty-eight. Danielle Steel is sixty-eight. As much as fans are still thrilled by another new novel from one of these veteran writers, it is telling that the publishing world has not discovered the next generation of authors who will similarly enjoy thirty to forty years of constant bestselling. Nor did the industry find, despite the thousands of manuscripts both rejected and published annually, a runaway bestseller for 2014 (Dragon Tattoo, Fifty Shades, and Gone Girl had been the standout hits of previous years), and neither did it publish a manuscript to impress the Pulitzer Prize committee in 2012. Why?
Well, it is a universal wisdom that bestsellers are freaks. They are the happy outliers. The anomalies of the market. Black swans. If that is the truth, then once you find a bestselling writer, why put your money anywhere else? Why put your millions on a new twenty-year-old writer instead of Stephen King? How could you possibly know if a new literary author is worth the sort of investment worthy of a future big-prize winner?
Book publishing is, aptly, full of the language of gambling. Acquisitions meetings often revolve around passionate arguments about choosing whether or not to "back a debut author." The excitement of a bidding war across different publishing houses might have you go "all in" and spend almost your entire season's budget on one book. The process is fun, and guesses are certainly educated, but it's a casino. Before finding a home at Bloomsbury, J. K. Rowling's Harry Potter was turned down by twelve publishers, and Rowling was told "not to quit her day job." The Harry Potter brand is now worth an estimated $15 billion. John Grisham was rejected by at least sixteen different publishers. Since then, Grisham has written the biggest seller of the year more than a dozen times. James Patterson was repeatedly rejected as he tried to get published. In 2010, he sold more than 3.5 million copies of his three titles that year. Kathryn Stockett was turned down by sixty agents before she found someone willing to represent The Help. That novel went on to spend one hundred weeks on the NYT bestseller list. There are, no doubt, many similar writers whose work currently sits discarded on the so-called slush piles of new manuscripts in offices all over New York and London.
Anyone connected even tangentially to the world of readers and writers knows a friend of a friend who got up for months at 4 A.M. to write her novel before work, who felt inspired by a killer story, who knew the muses were around, and who, having sent manuscripts all over Manhattan, gleeful and expectant, received nothing more than standard rejection slips.
Those friends of friends might be in good company. One editor who read the manuscript of The Spy Who Came in from the Cold told John le Carré that he had no future as a writer. William Golding's Lord of the Flies was rejected twenty-one times. After writing the now iconic On the Road, Jack Kerouac received a letter from an agent stating, "I don't dig this one at all." Ursula Le Guin was rejected on grounds of being "unreadable." That unreadable novel went on to win two major awards. Even George Orwell's novella Animal Farm was deemed unpublishable, and that by none other than T. S. Eliot. The great poet thought one of the most canonized political allegories of all time was "not convincing."
To publish or not to publish is a tough question. Big success prediction in the realms of storytelling can involve trying to estimate the sensibilities and inner selves of hundreds of thousands of different people. It is no easy job, and often the rationale behind decisions seems perfectly understandable. The U.S. editors who rejected The Girl with the Dragon Tattoo, for instance (and we have asked some of them), thought that American readers would be bored by all the Swedish politics in the novel. They thought Lisbeth Salander was a bit moody and aggressive for a female lead. They believed the mainstream would respond badly to a book with horrific scenes of anal rape and the avenging Lisbeth with her tattooing needles. That seems a quite reasonable reaction.
It's no surprise, then, that editors, when perfectly honest, sometimes claim that big success prediction ranges somewhere between a wet finger held up into the air, and the mysterious crystal ball that the highest paid agents and publishers seem to conceal under their desks. Unless the author is already a big name, a James Patterson or a Nora Roberts, it's a crapshoot. Sometimes, circumstances help — now and again your author is a Hollywood diva and her subject is her sex life — but even when it seems like a sure bet, we have seen some of the vast print runs that follow big advances end up in the pulping machine. The public is fickle.
Excerpted from The Bestseller Code by Jodie Archer, Matthew L. Jockers. Copyright © 2016 Jodie Archer and Matthew L. Jockers. Excerpted by permission of St. Martin's Press.
All rights reserved. No part of this excerpt may be reproduced or reprinted without permission in writing from the publisher.
Excerpts are provided by Dial-A-Book Inc. solely for the personal use of visitors to this web site.
Table of Contents
1 The Bestseller-ometer, or, How Text Mining Might Change Publishing 1
2 The Godparents, or, Why You Must Take Time to Date 33
The Lists: Theme 70
3 The Sensations, or, How to Form Some Perfect Curves 73
The Lists: Plot 111
4 The Debutantes, or, Why Every Comma Matters 113
The Lists: Style 145
5 The Noirs, or, What the Girl Needs 147
The Lists: Character 182
6 The One, or, When the Algorithm Winked 183
The Lists: All Data Points 202
Epilogue The Machine-Written Novel, or, Why Authors Really Matter 207
Postscript, or, Some Background on Method 219
Most Helpful Customer Reviews
Heard about the book and bought it. Loved it. The DNA and plot of the best sellers are not subjects I like to read or write, but there is a lot of substance in the book for all writers. The use of strong nouns, verbs, the sparse use of adverbs and adjectives, the determination of words like need and want are all points that authors can take into their writing and increase the readership. Of course, that's just my opinion. If the writer is seeking to make a good living from writing--follow this book. If you want to write stories that you think need to be told, write away but take some of their recommendations into consideration. This book will be in college courses, critique groups, writers conferences and Guild's, and might well force the Big Four to continue with their propensity to lean their offerings this direction.
Just the best!