The Desolation of Statistics: Book Length vs. Movie Length, Part 2

“A wizard’s data is never accurate nor inaccurate. He graphs it precisely how he means to.”

“A wizard’s data is never accurate nor inaccurate. He graphs it precisely how he means to.”

[Update: I’ve uploaded the raw data to Google Docs for your number crunching pleasure.]

In last week’s article, I started with a simple question: how do book lengths, as measures by word count, compare to their adapted movie run times, as measured by seconds? I was mostly looking for a statistical basis to express my displeasure at The Hobbit: An Unexpected Journey (and by extension, parts 2 and 3 of this unnecessary trilogy), but I wound up comparing the density of the Hobbit movies, as measured in Words in Book per Second of Movie (WIBPSOM), to other prominent movie adaptations of books: The Lord of the Rings, The Hunger Games, and the Twilight franchises.

The findings were interesting in and of themselves (TL;DR: The Hobbit Books have way smaller WIBPSOM values than the other franchises), but they begged for a larger scale analysis, both in size of dataset and scope of inquiry. To address the size of the dataset, I found all of the (English language) entries on this list of best-selling books that have theatrically-released, non-silent movie adaptations. After including multiple movie adaptations of the same movie and excluding movies where I couldn’t find any data on book length as measured by word count, I came up with a dataset of 59 movie adaptations of best selling books.

As for scope of inquiry, well, let’s get down to brass tacks: is there any relationship between the density of a book’s movie adaptation, as measured by WIBPSOM, and the quality of the movie, as measured by its IMDB rating?

In a word, the answer to this intriguing question is an emphatic “no.”

scatter-imdb

Click for larger version

Don’t let the slightly inclined trend line fool you into thinking there’s any there there: the positive correlation is weak to the point of being utterly nonexistent. Let me repeat: there is no statistical correlation, much less a suggestion of causation, between the density of a book’s movie adaptation as measured by WIBPSOM and its quality as measured by IMDB ratings. None whatsoever.

But did any of you seriously expect there to be? I hope not. I acknowledged over and over again that this sort of analysis is highly reductive by its nature and absolutely does not take into account the different ways books convey plot or the different ways movies add to, subtract from, and otherwise distort those stories, to say nothing at all about how they go about adapting those stories to the visual medium. That being said, I felt like we had to run these numbers and put this question to rest. So you’re welcome, Internet.

Now that we’ve put that question aside, let’s press on with less crucial, but still interesting avenues of inquiry. Another idea that I surfaced in Part 1 was that studios are splitting single novels into multiple books as cynical cash grabs, The Hobbit being just the most egregious example of this practice. The last installment of the Twilight, Harry Potter, and Hunger Games franchises have been or will be split into two parts, which led me to ask: is this a recent trend, a symptom of Hollywood’s increasing greed and laziness?

The answer is “no, probably not.”

over-time

Click for larger version

Sure, there seems to be a slight downward trend starting from 1980 through the current decade, but remember that the current decade isn’t complete yet and has years of movie adaptations left to counteract the Hobbit-induced low WIBPSOM. In other words, if you’re looking for metrics that point to increasing Hollywood greed and laziness, you’re better off looking elsewhere.

The last idea from Part 1 that warranted further analysis from an expanded dataset was that there might be an “sweet spot” WIBPSOM value that moviemakers should shoot for when adapting movies from books. If “sweet spot” implies a level that leads to quality, then as we demonstrated with the first graph, such a thing does not exist. But if we think of the “sweet spot” as a value that’s typical across the large range of book-to-movie adaptations, then sure, we can use the data to see if such a thing exists.

And hey, it seems like we have something here!

distribution

Click for larger image

Among the 59 movies in my dataset, it’s most common for adaptations to fall in the 10-15 WIBPSOM range. Titles ranging from To Kill a Mockingbird to both parts of Twilight: Breaking Dawn land here.

Now, does this mean that a filmmaker should aim for the 10-15 WIBPSOM range? Absolutely not, for all the reasons I’ve described above. Nevertheless, it’s interesting to see how this distribution naturally shakes out into a tidy bell curve centered around 10-15 WIBPSOM. And in case you’re wondering, the sole occupant of the 30-35 band is Harry Potter and the Order of the Phoenix: 257,154 words, 138 minutes, and a 31.06 WIBPSOM.

So what, then, are we to make of these findings (and non-findings)? Not a whole lot, I’ll readily admit. I’m glad that I’ve sparked a lot of conversation about this topic, and that we’re able to invoke some data in the conversation, but in some ways I’m equally glad that I’ve been able to use that same data to show the limitations of their usefulness when it comes to describing something so subjective as plot content in a novel or movie. At the end of the day, after we’re done accusing movie studios of making cynical cash grabs, and after we’re done accusing me of making reductive data models (guilty as charged), we still have the utterly unpredictable and unquantifiable art of storytelling.

418,053 words, 238 minutes, 29.27 words in book per second of movie. As God as my witness, I'll never go without data again.

418,053 words, 238 minutes, 29.27 words in book per second of movie. As God as my witness, I’ll never go without data again.

19 Comments on “The Desolation of Statistics: Book Length vs. Movie Length, Part 2”

  1. Stokes #

    I think the WIBPSOM vs. IMDB rating chart is flawed because the quality of the source material is going to be a confounding factor. I feel like ideally you would want to replace the IMDB rating with the change in critical reputation between the book and the movie, so that something like The Great Gatsby, where the movie got panned but the source is a bona-fide classic, would have a much lower score than something like Twilight IV, which was garbage in/garbage out.

    This is tricky, though, because as far as I know there’s no equivalent rating site for books.

     
    • Mark Lee #

      That’s a good point. Maybe cross reference the list of best sellers with a list of greatest books? That would, at the least, weed out things like Twilight and the Dan Brown novels.

       
    • Laura #

      Chick flicks score poorly on imdb rating because most imdb voters are males. This is not accounted for in this analysis.

       
      • Julio #

        Twilight movies and books are horrible. They may serve a specific audience, but the books were written poorly and the movies made poorly. Bland, dull, poor acting, bad special effects and on. If you check out rotten tomatoes (aggregate of paid movie reviewers) or any other site, they score very poorly. You’re fine to enjoy them, but you shouldn’t confuse your own preference with quality. I love the movies Waiting and Grandma’s Boy, that doesn’t make them good movies.

         
  2. Anton Sirius #

    I missed the first post, so I’ll pick up the conversation from this comment, which I completely agree with.

    Basing this study on a Hobby vs LOTR comparison is just silly. The quest in the Hobbit spans more than enough time and events to cover three movies – remember, (or if you’ve never read it, spoilers!) Bilbo was gone so long he’s been declared legally dead and they were auctioning off his stuff by the time he got back to Bag End. It’s simply written in a much simpler way than the LOTR trilogy.

    If you have issues with the Hobbit’s pacing, it has nothing to do with the amount of material covered by the movie.

    Besides, it’s downright manic compared to Where the Wild Things Are’s languid 0.05 WIBPSOM…

     
    • Mark Lee #

      First, I understand why you and others are getting bent out of shape over the Hobbit/LOTR analysis. The text is near and dear to a lot of people’s hearts, and you’re right, the writing style of the books are different. But this is an exercise in cold, dispassionate data analysis, one that intentionally glosses over subjective differences in writing style. My only response is to try to distinguish the forest (the aggregate dataset) from the trees (quibbles with particular books/movies)

      Second, to your point about Where the Wild Things Are — I intentionally omitted it from the dataset because it’s primarily an illustrated book. Sorry, I should have mentioned that additional caveat in the description of the dataset. You can see from the Google Doc with the raw data that it’s not factored into the calculations.

       
  3. Demosthenes #

    Interesting analysis. We’ll be linking from TheOneRing.net and it’s sure to cause (another) stir. Just a quick note though: you need to correct your spelling of “Tolkein” to “Tolkien” in the first graphic at the topic of the page.

     
    • Mark Lee #

      Thanks for linking–and thanks for pointing out the typo. It’s corrected in this article and in Part 1.

       
  4. Julio #

    Do these ratings factor in all the appendices used from the ROTK in the Hobbit films? Its not just text from the Hobbit book that should be used in the formula. Additionally, how much text was left out of the LOTR movies from the books. In comparison, all events and more from the Hobbit will be covered in the movies, whereas in LOTR, huge chunks of text were left out in the movies. Similarly in the later Harry Potter films, particularly book 5, the longest book, yet not the longest film, with huge chunks of text omitted from the film. It would be extremely laborious, but real statistical analysis of this trend would determine the exact amount of words/pages utilized from the source (book) in the movie to come up with a more accurate description of what you’re trying to purport.

     
    • Mark Lee #

      “Do these ratings factor in all the appendices used from the ROTK in the Hobbit films?”

      A lot of folks have been asking about this–unfortunately, the answer is no. You go on to basically describe the problem with this: it would be incredibly laborious to do this across the board for a consistent analysis across all of these books. Many have argued that what I’ve done here is a gross over-simplification, particularly with the Tolkien books. Not to be flip, but that’s kind of the point. To the extent that this analysis has any significance at all, it comes from a consistently applied aggregation methodology.

       
  5. Wenyip #

    I wonder if the reverse comes out to a similar bell curve around 10-15 WIBPSOM… that is, the length of the novelisations of movies. My instinct is that they’d be shorter than original books adapted into films, but it’d be interesting to find out.

     
  6. Jane #

    About the Hobbit movies: I’m wondering if you are going by the word count of just the Hobbit, or if you included the source material from the appendices of the Lord of the Rings or other works that are included in the movies. Unlike most of the movies you are comparing, the Hobbit movies are supposed to give us more plot than was strictly contained in the book. Not that the first movie didn’t feel thin to me too… but I’d like to see it not completely short-changed in this analysis.

     
  7. Tom #

    If you do WIBPSOM/1 you’ll get a “speed of story-telling in film”, and I bet it’s a Gaussian distribution.

     
  8. simhedges #

    I wonder if you have considered short stories? The Birds, for example, or Something Wicked This Way Comes, The Bicentenial Man, A View to a Kill a and so on.

     
  9. Grim_ungainly #

    The WIBPSOM distribution looks like a Gamma distribution to me. Perhaps we could do some parameter estimation and see if there is any reasonable fit. What distribution would be expected if both components of WIBPSOM were independent, randomly distributed variables?

     
  10. Timothy J Swann #

    Mark, you didn’t factor into your analysis every detail of potential screenwriting sources, (the Lord of the Rings has non-book scenes too), or every statistical bias present in the entirety of your source material. How dare you!

    (I don’t like emoticons, so: wink!)

     
  11. Patrick #

    Thanks for the interesting analysis in this and Part 1.

    I have another data point for you to consider: books written in the first-person perspective versus those written in third-person. It seems to me that if you are first-person, you will end up with quite a lot of inner monologue which will largely have to be dumped unless the movie is heavy on narration. Two examples of first-person series from the article are Twilight and Hunger Games.

    Twilight is first-person, but the events are generally so banal that the movies skip over or compress a great deal of them, while expanding the action elements which were quite brief and poorly sketched in the books. (Aside: yes, I’ve read and seen all of them, for reasons I can’t adequately describe. I just sort of felt compelled to keep going despite myself. Those of you reading this who have seen a Twilight movie may find it difficult to believe that the books could be less interesting, but take it from me, they are.)

    while Hunger Games has necessarily dumped a lot of non-action first-person scenes from the book, the movie made up for this in part by expanding its purview to include new scenes that would have been impossible to describe in the novel in first-person, i.e. those where Katniss is not present, such as the control room for the games.

    Ultimately, analysing this aspect of the source material would require some breakdown and categorisation of chapters or sections into mainly action, mainly internal monologue and so on, and then comparing the movies on a scene-by-scene basis, so it’s a lot more work than it would be worth, but it could be intriguing to see if a movie’s WIBPSOM score and perceived quality of adaptation correlated with its ability not just to reproduce more words from the book on-screen, but to include the right kind of words.

     
  12. Mark #

    Julio’s point is valid, especially with respect to LOTR. Even with over 9 hours, the theatrical releases omit too much of the story line; the extended versions manage to fix some of this, but there are still entire threads of the story missing.

    On the other hand, I also thought initially that there was not enough story in The Hobbit to last three feature length films. That said, much of LOTR and its appendices is back story, necessary to both stories…my sense is that The Hobbit movie trilogy is filling in more pieces of the LOTR back story…the appearance of Radagast is but one example…

    It would be nice to have a more full explanation of the Ishtari…