Statistical Analysis of Movie Title Lengths

Spoiler alert: weak R squared values ahead.

[Updated 5/24/2011: I went back and added axis titles to the charts that were missing them, but more importantly, I added a couple different takes on the data that edge closer to statistical significance. – Lee]

If recent films have taught me anything, it’s that animal film titles should be as short as possible. “Hop” “Rio” If Mark Lee were around I’d ask him to graph length of title against total profit.

-Overthinking It Commenter “cat”, April 18, 2011

Well, fortunately for all of you, I am in fact around. And I brought my statistics with me.

Cat raises an interesting question. Titles are important parts of movies, and the people who pick them surely do so quite deliberately and with the goal of picking one that will help sell movie tickets. Is there some “sweet spot” for movie title lengths that movie makers go for to maximize profits?

Not really.

First order of business: fulfill Cat’s request that I “graph length of title against total profit.” Movie profit is a hard number to get at, so I’ll go with the next best thing, U.S domestic box office receipts over the last 10 years.

I took a trip to, pulled down their lists of the top 150 (U.S. domestic) grossing movies for the last 10 years, and came up with this:

I love it when a scatter plot comes together, even when its R squared is pathetically small.

Don’t let the trend line fool you: this is a horribly weak positive correlation (hence the miniscule 0.02 R squared value). In other words, length of movie has almost nothing to do with the movie’s box office success.

What about the other part of Cat’s comment: do animal movies typically have shorter titles than non-animal movies? For this piece of analysis, I narrowed my sights on just the last year with a full dataset on, 2010. Sorry, Cat, but both the average and median lengths for what I define as “animal movies” come out to 21.7 and 27, respectively, which are both above the average & median for the 2010 movies in the dataset: 15.3 and 13, respectively.

The stats don’t bear this out, but one would think that animal films, marketed largely towards children, should have short titles that are easier for kids to whine to their parents to ask them to see. Hence the thinking that short titles like Rio and Hop would be common practice. That may turn out to be the case for 2011, but as far as 2010 is concerned, shorter movie titles do not correlate to animal movies. Damn you, Legend of the Guardians: The Owls of Ga’Hoole (45) and Cats & Dogs: The Revenge of Kitty Galore (40)!

Thinking about 2011 animal movies versus 2010 animal movies then got me thinking: is there any discernible trend in movie title lengths over the past 10 years? Has Hollywood been shortening titles on us in an effort to both save space on marketing materials as well as increase appeal to an audience with ever-increasing attention spans? Or maybe they’re getting longer due to hyphenated titles of sequels and the need to use longer phrases to come up with a title that hasn’t already been taken?

No such trend could be found. Lengths have fluctuated a bit over the last decade, but by and large they have remained constant:

Average Length of Title
2001 14.87
2002 16.05
2003 16.34
2004 16.63
2005 15.79
2006 15.73
2007 15.29
2008 15.96
2009 16.35
2010 15.27
2001-2010 15.83
Median Length of Title
2001 13
2002 15
2003 14
2004 15
2005 14
2006 14
2007 13
2008 13
2009 14.5
2010 13
2001-2010 14

Drat! No funky correlations? No trends? Then what’s an amateur statistician to do?

Straight-up data analysis, I suppose.

Check this one out: here’s a distribution of movie title lengths for the top 150 grossing movies for 2001-2010:

No big surprise here. There’s a big chunk of movies in the 6-10 range, and most of the remainder is made of longer movie titles that go all the way out to Borat: Cultural Learnings of America for Make Benefit Glorious Nation of Kazakhstan (83). Of course, the joke subtitle is rarely used when referring to this movie, so we have to skip past several entries to find the longest title for a movie that doesn’t have a subtitle or a way to easily and clearly refer to it by shorthand: Indiana Jones and the Kingdom of the Crystal Skull (50). Now, I know this particular title may bring back bad memories for some of you and that this title could be referred to by a shorthand name such as Crystal Skull, so if you’d prefer something else as the longest “stand alone” title, then I offer you Harold and Kumar Escape from Guantanamo Bay (43).

Either way, you’re still left with an outlier among outliers in the “long title” realm. With its 14 character median, Hollywood clearly prefers the short title to the long title.

Now, I know we’re all hoping for some sort of mind-blowing statistical discovery in movie title lengths, but after running through multiple regressions/data slices, I failed to find anything of note. No significant correlations could be found between movie title length and a variety of variables: box office take, MPAA rating, IMDB rating all turned up laughably poor results.

What the heck. Here’s my attempt at correlating movie title length and IMDB rating. R squared value of .0033:

Weak! I know, right? So that’s why I’m offering up all of my raw data to you crazy people to see if you can data mine any insights that I failed to uncover. Note that only the 2010 dataset has some extra data like MPAA rating, IMDB rating, etc.

If you find anything interesting, post it in the comments and/or email me your stuff at [email protected].

Updated 5/24/2011: Commenter “corbmac” took me up on the challenge and did his own number crunching! He wrote:

while there’s no correlation apparent in the first graph, there definitely appears to be a frontier ie the longer your title, the lower the highest gross for films at that length.
i banded the films by length in multiples of 6 letters (initially did it with multiples of 5 letters but avatar was distorting the results) and the correlation coefficient with the max gross for films in that band was -0.91.

After he kindly sent me his Excel work, I reformatted the scatter plot to be consistent with the rest of the graphs (just without the trend line as it was making the graph look funky):

Whoa! R squared of .912?!? Are we onto something here? Well, maybe, but I had some questions on his methodology. Specifically, why group by six? Is that sound stats? And why take the average of the min and max of each group and use that as the value (e.g, the first group, 1-6, takes a value of 3)?

So I ran the numbers by social scientist and friend of the site Yael, and she took a slightly different approach. Instead of grouping, she just took the maximum gross per each movie title length. Her explanation:

If you look at every number of letters, instead of clumping them, you still get a significant correlation between the max grossing film with each number of letters and number of letters, but a smaller one, with R= -0.325 and R^2 = 0.106, or about 10% of the variance in max gross accounted for by number of letters in the name.

Wow. That’s a lot of scatter plots, and a lot of statistics that are honestly over my head. I won’t be updating this post with any more reader contributions, but comments are absolutely still welcome!