Thoughts on The Bestseller Code, by Jodie Archer and Matthew L. Jockers

Too long, didn’t read summary:

Elements predictive of a bestseller

  1. Write about the real world: people, marriage, death, taxes, family, and modern technology.
  2. Plot your structure so that it regularly goes between different emotional states.
  3. Write plainly and for a general audience.
  4. Make characters that go against the grain (female, morally grey) and are active, not passive.
  5. Apparently, Dave Eggers’ The Circle is the perfect bestseller.

As much as I like my dozens of followers, I strive to be more popular. If not a Stephen King, then at least a John Scalzi. But like all creative people (and yes, I am a creative person, no matter what my inner critic says), I run into the capricious force that is public taste. I would love it if my thoughts and dreams reached a wider audience, but how can I maximize my exposure? Sure, I could give you all $1000 dollars to refer twelve friends to my website, but I’d probably starve before I even got a blurb on the Huffington Post. So that conundrum I’ve pondered from high school still remains: How to become popular? And how to sustain that popularity? Oh, I know how you philosophical types will respond: Aren’t you a Stoic? Why care about your popularity? Well, if I feel that I have something important to say, then I should try my hardest to spread that message as far and wide as possible, as long as I do it in a virtuous manner.

Full disclosure: I am writing a sci-fi novel. All of this is to ask how I can reach more people. Is it all just a crapshoot. Or is there a pattern to this madness? Enter The Bestseller Code: Anatomy of the Blockbuster Novel, by Jodie Archer and Matthew L. Jockers. There are a lot of books on writing novels out there, but this particular one caught my attention. Because this particular one used machine learning to analyze individual books. For the simple outcome of whether a book was a bestseller or not, they tested thousands of features (such as use of female protagonist, ratios of themes, number of times the word “and” was used, settings, and thousands of other possible variables. This, of course, tickled my interest, as I use similar analytical techniques in my day job, and I dabble in AI. Would computers know how I could sell my story? This post will provide an overview of the book and my impressions of it.

For their experiment Archer and Jockers, took thousands of bestselling and non-bestselling books and had computer algorithms read every line of each book. The algorithms would detect features of the book and then classify whether that book was on the New York Times bestseller list. If this sounds like exhausting work, it is. I can’t imagine the poor grad student who would need to scan in the books or transcribe the texts. They mention networking thousands of computers together to analyze long works. For specific algorithms, they used K-Nearest Neighbors, Support Vector Machines, and Nearest Shrunken Centroids. I’m not going to describe these in detail, because that would require me typing down mathematical equations. The gist is that these algorithms classify/predict whether a particular book is a bestseller based on features it trained on from the bestselling and non-bestselling examples. Their best algorithm had an 80% success rate, which may or may not be enough for your purposes. For me, if my novel had an 80% chance of becoming a success, I would pour all of my resources into promoting it. If it predicted an 80% chance of success for an open heart surgery, I’d be a little more reticent.

Let’s dive into what they found was most predictive of bestselling status. Standard caveats apply: this is not a basic recipe for novel writing. There is plenty of wiggle room involved, which is both a blessing and a curse from a creative standpoint. They start with the claim that bestselling status is not random. This is not a new observation. You would be surprised to not see Stephen King, Danielle Steele, or John Grisham on the NYT Bestsellers List. But the interesting thing is that there is something similar between these authors that predict success.

Chapter 2 attempts to answer the question: What should you write about. They agree with Stephen King that work (what?) is one of the best topics, but not sex (WHAT?!). Yes, sure, some sex sells, but only in a niche market. It is, however, not mainstream. After a lot of writing on how they defined topics, they list what is the most likely successful subject matter: real people, marriage, death, taxes, family, and modern technology. Of course, it’s not as simple as that, novels must focus on one or two themes at a time. And they must present them in a way that is realistic. Unfortunately, that means that fantasy and science fiction tend to be less successful. Well, since my novel involves a space war that doesn’t involve Earth, this was the first reason I wanted to throw this book against the wall. However, they did mention an exception, The Martian, so maybe there’s something to salvage.

Chapter 3 talks about the course of stories. You probably know about the three-act structure. The first act introduces the characters and the premise. The second act goes through the trials and tribulations of the characters and ends up on a down note. The third act goes through the emotional climax and the hero/heroine ends up on a different emotional plane than they did initially. Think about Fifty Shades of Grey and Cujo. Sounds simple enough, but the book examines the plot structure more closely with Fifty Shades of Grey. The core conflict of that novel is emotional closeness. Will the heroine submit or not to the male love interest? It is an ongoing conflict throughout the book that twists and turns, and these twists and turns follow a pattern of emotional highs and lows. (Spoiler alert!) The novel ends with an emotional high immediately followed by an emotional low. So the lesson is clear: plot your story carefully, and make it an emotional rollercoaster. So, for example, you can make a scene of emotional triumph (A graduation! Defeating the enemy! They fall in love!) followed by a tragedy (A sibling dies! The world is still ending! He breaks up with her because of his dark past!) and you will hold the reader.

Chapter 4 is about style. JK Rowling and Stephen King have their own style that can be identified by a computer algorithm. There is a lot that they say and advise about style, but I will try to be brief. Avoid the use of the word “very”. Make the opening sentence short, snappy, and encompassing of the primary conflict. Be less formal, but avoid the excessive use of exclamation points! Use more contractions. Be plain rather than decorative, like a fir tree versus a Christmas tree adorned with golden baubles of lights and angels and stars. We also can’t forget gender. Women by far dominate the bestseller list. It’s not something inherent in gender (but I suspect the authors don’t indulge further in speculation because of political correctness) but women tend to use a style more appropriate for a general audience.

Chapter 5 is about characters. The arguments in this chapter make the most intuitive sense. Unpredictable characters (especially if they’re not the girl-next-door type draw in audiences. They also must do things. They must not be passive. They must desire something and they must act to acquire it. They must be confident about acquiring the object of their desire. They must act, not wait. They must be, not seem. They must grab, not hesitate. The second reason I wanted to throw this book against the wall was because my main character is a shy guy involved in a space war that doesn’t involve Earth. Oh well, I don’t control public tastes. Perhaps my next novel will be about a morally grey woman set in the gross underbelly of Houston, Texas.

Chapter 6 is a summary of their main points. Their algorithm points at Dave Eggers’ The Circle as the perfect bestseller. I’ve never read the book, but according to Archer and Jockers, it has all the right ingredients: a killer title; a compelling first sentence (“My God, Mae thought, it’s heaven.”); an idiosyncratic female protagonist, her failed relationships, and our fear of technology. Unfortunately, without access to sales data, I cannot validate their choice, although since a movie was made based upon the book, it must have been reasonably popular.

There are many things to digest in this book and many claims that need clarification. There is that feeling of deflation that the type of novel I envision doesn’t fit in with the zeitgeist. There is the envy that other people had the bright idea of applying machine learning to literature and that I had not thought of it first. However, I think the authors glide over the potential problems with their analysis. Part of it is not under their control. They had to train their algorithms on already published books. Their definition of bestseller was based on the NYT list. They did not have access to specific sales data. And, most of all, this was based on the US market. Perhaps other countries do not like idiosyncratic female characters. If I was dictator of the publishing world, I would like to test these predictions on unpublished manuscripts and in datasets that include the rest of the world. There’s more to the market than just Americans and machines.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s