Statistical Analysis of Movie Title Lengths

[Updated 5/24/2011: I went back and added axis titles to the charts that were missing them, but more importantly, I added a couple different takes on the data that edge closer to statistical significance. – Lee]

If recent films have taught me anything, it’s that animal film titles should be as short as possible. “Hop” “Rio” If Mark Lee were around I’d ask him to graph length of title against total profit.

-Overthinking It Commenter “cat”, April 18, 2011

Well, fortunately for all of you, I am in fact around. And I brought my statistics with me.

Cat raises an interesting question. Titles are important parts of movies, and the people who pick them surely do so quite deliberately and with the goal of picking one that will help sell movie tickets. Is there some “sweet spot” for movie title lengths that movie makers go for to maximize profits?

Not really.

First order of business: fulfill Cat’s request that I “graph length of title against total profit.” Movie profit is a hard number to get at, so I’ll go with the next best thing, U.S domestic box office receipts over the last 10 years.

I took a trip to BoxOfficeMojo.com, pulled down their lists of the top 150 (U.S. domestic) grossing movies for the last 10 years, and came up with this:

I love it when a scatter plot comes together, even when its R squared is pathetically small.

Don’t let the trend line fool you: this is a horribly weak positive correlation (hence the miniscule 0.02 R squared value). In other words, length of movie has almost nothing to do with the movie’s box office success.

What about the other part of Cat’s comment: do animal movies typically have shorter titles than non-animal movies? For this piece of analysis, I narrowed my sights on just the last year with a full dataset on BoxOfficeMojo.com, 2010. Sorry, Cat, but both the average and median lengths for what I define as “animal movies” come out to 21.7 and 27, respectively, which are both above the average & median for the 2010 movies in the dataset: 15.3 and 13, respectively.

The stats don’t bear this out, but one would think that animal films, marketed largely towards children, should have short titles that are easier for kids to whine to their parents to ask them to see. Hence the thinking that short titles like Rio and Hop would be common practice. That may turn out to be the case for 2011, but as far as 2010 is concerned, shorter movie titles do not correlate to animal movies. Damn you, Legend of the Guardians: The Owls of Ga’Hoole (45) and Cats & Dogs: The Revenge of Kitty Galore (40)!

Thinking about 2011 animal movies versus 2010 animal movies then got me thinking: is there any discernible trend in movie title lengths over the past 10 years? Has Hollywood been shortening titles on us in an effort to both save space on marketing materials as well as increase appeal to an audience with ever-increasing attention spans? Or maybe they’re getting longer due to hyphenated titles of sequels and the need to use longer phrases to come up with a title that hasn’t already been taken?

No such trend could be found. Lengths have fluctuated a bit over the last decade, but by and large they have remained constant:

Average Length of Title
2001 14.87
2002 16.05
2003 16.34
2004 16.63
2005 15.79
2006 15.73
2007 15.29
2008 15.96
2009 16.35
2010 15.27
2001-2010 15.83
Median Length of Title
2001 13
2002 15
2003 14
2004 15
2005 14
2006 14
2007 13
2008 13
2009 14.5
2010 13
2001-2010 14

Drat! No funky correlations? No trends? Then what’s an amateur statistician to do?

Straight-up data analysis, I suppose.

Check this one out: here’s a distribution of movie title lengths for the top 150 grossing movies for 2001-2010:

No big surprise here. There’s a big chunk of movies in the 6-10 range, and most of the remainder is made of longer movie titles that go all the way out to Borat: Cultural Learnings of America for Make Benefit Glorious Nation of Kazakhstan (83). Of course, the joke subtitle is rarely used when referring to this movie, so we have to skip past several entries to find the longest title for a movie that doesn’t have a subtitle or a way to easily and clearly refer to it by shorthand: Indiana Jones and the Kingdom of the Crystal Skull (50). Now, I know this particular title may bring back bad memories for some of you and that this title could be referred to by a shorthand name such as Crystal Skull, so if you’d prefer something else as the longest “stand alone” title, then I offer you Harold and Kumar Escape from Guantanamo Bay (43).

Either way, you’re still left with an outlier among outliers in the “long title” realm. With its 14 character median, Hollywood clearly prefers the short title to the long title.

Now, I know we’re all hoping for some sort of mind-blowing statistical discovery in movie title lengths, but after running through multiple regressions/data slices, I failed to find anything of note. No significant correlations could be found between movie title length and a variety of variables: box office take, MPAA rating, IMDB rating all turned up laughably poor results.

What the heck. Here’s my attempt at correlating movie title length and IMDB rating. R squared value of .0033:

Weak! I know, right? So that’s why I’m offering up all of my raw data to you crazy people to see if you can data mine any insights that I failed to uncover. Note that only the 2010 dataset has some extra data like MPAA rating, IMDB rating, etc.

If you find anything interesting, post it in the comments and/or email me your stuff at [email protected].

Updated 5/24/2011: Commenter “corbmac” took me up on the challenge and did his own number crunching! He wrote:

while there’s no correlation apparent in the first graph, there definitely appears to be a frontier ie the longer your title, the lower the highest gross for films at that length.
i banded the films by length in multiples of 6 letters (initially did it with multiples of 5 letters but avatar was distorting the results) and the correlation coefficient with the max gross for films in that band was -0.91.

After he kindly sent me his Excel work, I reformatted the scatter plot to be consistent with the rest of the graphs (just without the trend line as it was making the graph look funky):

Whoa! R squared of .912?!? Are we onto something here? Well, maybe, but I had some questions on his methodology. Specifically, why group by six? Is that sound stats? And why take the average of the min and max of each group and use that as the value (e.g, the first group, 1-6, takes a value of 3)?

So I ran the numbers by social scientist and friend of the site Yael, and she took a slightly different approach. Instead of grouping, she just took the maximum gross per each movie title length. Her explanation:

If you look at every number of letters, instead of clumping them, you still get a significant correlation between the max grossing film with each number of letters and number of letters, but a smaller one, with R= -0.325 and R^2 = 0.106, or about 10% of the variance in max gross accounted for by number of letters in the name.

Wow. That’s a lot of scatter plots, and a lot of statistics that are honestly over my head. I won’t be updating this post with any more reader contributions, but comments are absolutely still welcome!

20 Comments on “Statistical Analysis of Movie Title Lengths”

  1. David #

    The correlation you got in the first graph: I figure it might be entirely caused by the fact that sequels have ungodly long titles, and sequels are also correlated with big Hollywood studio films. So sequels with long names are basically indicators of a franchise, and that typically happens only when there’s money to be won.

     
  2. Kevin #

    What about number of Academy Award nominations/wins? Any correlation between that and movie title length?

     
  3. cat #

    :( I was so blinded by the recent movie entries that I forgot to consider the extended titles for sequels. Thanks for trying anyway. Glad to have you back from vacation. …I wonder if results would improve if you removed sequels. That is, to initially appeal to audiences is a title usually shorter?

     
  4. DrSylvan #

    Using the “official” title of the movie isn’t necessarily a perfect indicator, because few people refer to each movie by the full subtitle. For example, almost no one called “Transformers 2″ “Transformers 2: Revenge of the Fallen” (or the more apt “Transformers 2: Michael Bay’s Id”), and same goes for “Eclipse” (not “The Twilight Saga: Eclipse”) or any Harry Potter movie.

    Your numbers are heavily influenced by the crowd of top-grossing franchises that are never, in practice, referred to by the full name.

    Nevertheless, I don’t think there’d be a meaningful correlation between conversational title length and revenue, because all of the big movies will be striving to have a similarly short name on the tip of the moviegoer’s tongue. I’m sure various marketing suits somewhere study how to get a title (which is functionally a brand name) to be exactly short enough for easy recollection, or to craft ads so that an abbreviated version is what sticks in the audience’s brains.

     
  5. Johann #


    Now, I know we’re all hoping for some sort of mind-blowing statistical discovery in [insert dataset on social topic here], but after running through multiple regressions/data slices, I failed to find anything of note. No significant correlations could be found between [social topic of interest] and a variety of variables.

    That describes an absolutely typical experience of any (social) science researcher.

     
  6. Rosa #

    Units, Mark!

     
  7. Evan #

    R^2 of 0% !?!? You should really re-over think this graph.

     
  8. Mark #

    How do you use BoxOfficeMojo.com to get all that data? Did you have to go movie-by-movie, or do you have to have a premier account to do a custom search? I wanted to do a correlation between your title length data and runtime, but I’m too lazy to do it manually.

     
    • Lee #

      Copy-paste got me the lists of top 150 grossing movies per year. For 2010 only I manually data-entered IMDB ratings and MPAA ratings. It actually went faster than you’d think; the key is having enough screen real estate (2 high res monitors) to be able to flip between two windows quickly.

      So yeah, manual data entry.

      The things I would do for direct access to the IMDB DB…

       
  9. corbmac #

    while there’s no correlation apparent in the first graph, there definitely appears to be a frontier ie the longer your title, the lower the highest gross for films at that length.
    i banded the films by length in multiples of 6 letters (initially did it with multiples of 5 letters but avatar was distorting the results) and the correlation coefficient with the max gross for films in that band was -0.91.

     
    • Lee #

      Very cool. Can you send me the results, so I can post on the site? lee at overthinking it dot com

       
  10. Richard #

    As a test of the methodology, might I suggest doing a similar correlation with Amazon’s Top 100 Books?

     
  11. Gab #

    SPSS?

     
  12. Carl #

    I think this study could be an excellent illustration of one of the biggest problems in research today – mining the data for results.

    While all of us know that “correlation is not causation”, it is often hard to internalize the concept. When we see a correlation, it’s often hard to remember that some results are just statistical accidents…

     
    • petrlesy #

      1) i believe there is no such thing as “statistical accident”. the root cause might not be known, there might be confounding factors but the results are always possible to explain
      2) we don’t exactly see a correlation in this case, do we?

       
  13. Mark #

    You should seriously label your axes. Even when they are obvious. One out of three is not okay. Come on, I demand professionalism.

     
  14. Lee #

    OK, OK, sorry for the lack of axis labels. I’ve updated the original graphs, plus added 2 new ones submitted by readers! Enjoy, and keep the comments coming.