It’s October, which means it’s time to subject the IMDB Top 250 Movies list to a level of quantitative scrutiny it probably doesn’t deserve. For those of you who are new to this series, here’s a quick recap: I’m four years into an effort to analyze changing movie tastes through lists of top movies.
In 2008, I compared the AFI Top 100 movies lists of 1998 and 2007 to the IMDb Top 250 movies list as of September 2008 and found the industry (AFI) list skewed towards older movies compared to the fan list (IMDb) and that the contents of the AFI list had only advanced by 5 years in the 9 years between the two editions.
In 2009 and 2010 I returned to the IMDb lists by taking snapshots of the list at around the same time of year as the initial analysis (late September/early October) and attempted to extrapolate some meaning from the changes in the lists’ composition over the years.
And now, in 2011, I’m adding a fourth year of IMDb data to the analysis! Will any trends emerge? Can we predict the future of the IMDb Top 250 list? Read on to find out.
Before we begin, I should state the obvious limitations to using the IMDb list as a tool for analyzing changes in movie tastes over time. The IMDb list is far from authoritative, and the potentially skewed demographics of its voters casts even more doubt on the validity of the list. Furthermore, the very exercise of assigning a single numerical rating to a movie is more than reductive; it’s borderline absurd. But put all of that aside for a moment. The IMDb list, for all its flaws, is well-known, and its ratings are accepted as being decent rough indicators of movie quality.
With that being said, let’s pick up where we left off last year, when I asserted that the IMDb list’s perceived bias towards newer movies was real, and getting worse over time. From last year’s article:
When I fired up Excel to do this analysis, I was pretty sure that I would find that the overall shift of the dataset in terms of median year would be greater than the concurrent shift in time. And I was right:
- Median Year of IDMb Top 250 List as of 9/30/2008: 1975
- Median Year of IMDb Top 250 List as of 10/18/2009: 1977 (jumps 2 years after 1 year)
- Median Year of IMDb Top 250 List as of 9/26/2010: 1981.5 (jumps 4.5 years after 1 year)
Yup, you read that right. Over the course of the last year, the median year of the IMDb Top 250 movies list increased by 4.5 years, which suggests that not only is the lists’ bias towards newer movies still present, it’s intensifying over time.
So how does the most recent sampling fit in with this trend? The list as of 10/3/2011 had a median year of 1983.5, or a jump of 2 years after 1 year. So the tilt towards newer movies wasn’t as severe as it was from 2009-2010, but it was more than what you’d expect if the movies were evenly distributed from the minimum to the maximum for each year.
Let’s see how that plays out in chart and graph form:
|Year||Min||Max||Actual Median||Theoretical Median|
(Note: the “theoretical median” = what the median year of the list would be if the movies were evenly distributed between the oldest movie on the list and the newest.)
I thought it would be fun to make a rudimentary projection as to what would happen to this list based on the trend of the last 4 years. So I looked at the changes in differences between median year and list year: from 2008-2009, the difference reduced from 33 to 32; from 2009 to 2010, the difference reduced from 32 to 28.5; and from 2010 to 2011, the difference reduced from 28.5 to 27.5. Each change can also be stated as a reduction factor (e.g., 32/33 = .9697). Averaging these three factors produces a single average reduction factor of .9417, which, if you apply to future years, we can use to come up with the shocking prediction that the list’s median year will essentially equal the year of the list in about 70 years:
OK, calm down folks. You don’t have to be a statistician to see the problems with this approach. First, intuitively, it makes no sense. Can you imagine a top movies list in the year 2070 that has as many movies on it from all years prior to 2069 as it does for the years 2069 and 2070? Second, and more importantly, a sample size of 4 is way too small to make this sort of projection. (Also, my little trick of averaging reduction factors probably isn’t sound math, but it worked in that it produced the nice, albeit erroneous, graph you see above.)
About that sample size. Unfortunately, I’ve only been doing this for four years. Now, if only there were some way to go back in time and capture the status of the list from previous years. If only…
But wait! Such a thing exists. Some fortuitous Google searches led me to the “IMDB Top 250 History” website, which has snapshots of lists going all the way back April 1996.
Jackpot! I took additional snapshots from the same time frame, added them to the analysis, and…
…was disappointed to find that the trend totally disappeared with the expanded dataset:
|Year||Min||Max||Actual Median||Theoretical Median|
Turns out that in its earlier days, the IMDb Top 250 list was even more skewed towards newer movies than it is today, both in relative and absolute terms. And even with the larger sample size, the swings in median year make any sort of projection unfeasible, whether it’s with my fuzzy math method or a more formal linear regression analysis. And we haven’t even factored in IMDb’s changes and tweaks to its ranking algorithm over the years.
Sorry, folks, we can’t predict the future of the IMDb Top 250 Movies List through statistics. Or even make an educated guess. But we can still have a lot of fun analyzing the changes that have occurred to the list. Read on for more: