Monday, 18 April 2016

Imdb Top 250 made easy

I know that many are going to hate me for saying this. I might even receive death threats for doing this! It has to be said, I will say it. This post is about Maths!!
Okay, now the good news. This is a "Math does not suck" type of post and we are going to talk about movies, Imdb top 250 to be specific. For the uninitiated, Imdb is a database of every movie ever made(Almost). Anybody can review movies there and anybody can give a rating. 1 for the blood curdlingly bad films and 10 for the ones you would want to recommend to your girlfriend. Imdb top 250 is the list of 250 most popular movies as decided by the Imdb voters(Anybody can be a voter). We will take a peek at the scary formula first and then we will see how easily you can arrive at it if we use some common sense (Common sense is not so common, I know!).
Well, as Cracked magazine wrote once, Our brains are not meant to instinctively understand any equation more complex than this:
Bear = Run Away
So don't worry if the formula for Imdb top 250 looks like the lyrics of that Salman Khan song PO Kendi Po Po(or whatever).
Here is the formula:
weighted rating (WR) = (v ÷ (v+m)) × R + (m ÷ (v+m)) × C
Where:
R = average for the movie (mean) = (Rating)
v = number of votes for the movie = (votes)
m = minimum votes required to be listed in the Top 250 (currently 25,000)
C = the mean vote across the whole report
What on earth is this? Do mathematicians have to spoil movie lists also? Hang on while I unpack this formula. Let us try to build a rating system.
Can't we just take the simple average of all votes? Let us say I am sitting with my friend Ganesh and Mukesh Ambani. I earn 1 lakh Rs, Ganesh gets 50K and Ambani has 500 lakh Crores! What if we calculate the average income of this group? Our average would be 166 lakh crores or something. I don't even know how many zeroes are there in one lakh crore but the average income is this! This is the problem with averages. Extreme cases like Ambani will distort it.
There is another problem. Say you have to review a restaurant. Let us give a rating to a Hotel based on some factors. Seats = 8/10, Tables = 9/10. Suppliers = 9/10, Plates = 10/10 Food = 1/10. The hotel gets 37/50. If we just take the average, this hotel will get a rating of 7.4/10. What do you think is wrong with this rating system? Equal weightage (importance) is given to everything!! If everything else is good but the quality of food is terrible the hotel should not get a good rating, right? This is why in math we have the concept of weightage. Go back to the equation and note the word "Weighted".
Say, there is this guy Mr fan. Only film fan has seen in life is Shahrukh Khan's Crappy new year, this guy might think that crappy new year is the greatest piece of artistic achievement in the history of Universe. And we have Mr Roger Ebert who has seen 12000 movies in 33 different languages. Both have voted for this film. Whose opinion should count more? Whose votes should get more weightage?
Take another case. Here is that intellectual and angry critic who watched 100 films and hated 99 of them. 2/5 by him is considered to be very generous. Would you count on this guy's ratings? And there is this person who has watched 100 random films and he has given 4/5 to 98 of them. Either he is one of those power of positive thinking dudes or he is lying. Again, this person's ratings should not get much weightage. If you suddenly vote on a large number of films in the weekend that Dilwale came out, giving Dilwale the highest score, again we don't need a Sherlock Holmes to say what type of person you are!!
Imdb tries to give more weightage to neutral voters who have seen lots of films, it tries to punish the votes by extreme cases.
Here is another scenario. Let us say we have 2 films. One is called Kejriwal and one is called Modi. These films have received 1000 votes each. What if all these were Namo fans? They would all give 10/10 to Modi film and 0/10 to Kejriwal film. They wouldn't even bother to watch the films. What if these 1000 voters are the hyper liberals? They would give 0/10 to Modi with the fiery passion of a thousand blazing suns. System would not be fair to either Modi or Kejri. We need more neutral voices to smooth out this type of extreme hatred.
If you were a mathematician, how would you solve this? How to make the votes more neutral mathematically? Firstly, Imdb says that there should at least be 25000 votes. When we have 25000 votes we can assume that not all of them are going to be Namo fans or sickular libtards. Another way to solve the problem is to give some neutral votes by default, give it like a bonus. If we blindly give 20000 votes with 7/10 rating then these 1000 Namo haters cannot spoil the rating easily. They will need 20000 haters to bring down the 20000 neutral votes, that is difficult.
Now what if we have 5 lakh voters? That would be the real verdict. We won't need any neutral votes in this case. All 5 lakh are not going to be extremists. So basically when there are only 1000 votes we want the neutral votes to matter, when there are 5 lakh votes we don't want these neutral votes to count. This is exactly what the equation is doing. Here is the formula again. (v ÷ (v+m)) × R + (m ÷ (v+m)) × C. Let us dissect this formula. It looks like this:
Something x R + Something else x C.
C is the neutral vote which is blindly given by Imdb. R is the actual average vote given by voters.
Something = very high when v is a big number. Something else = very high when v is small. In other words, something = very high when v is big(There are 5 lakh voters). Something else = very high when v is small(only 1000 voters).
Put another way, it is actual votes x something + Neutral votes x something else. It is like a seesaw with Neutral votes on one side and actual votes on the other.
One mathematician called Thomas Bayes(Famous for Bayes’s theorem) came up with the idea of starting with neutral votes. Hence it is called as a Bayesian estimate. See! We arrived at this fancy formula just by using some common sense and 6th std level maths !! Isn't that cool?

No comments:

Post a Comment