Netflix Prize

The Netflix Prize is a competition that began on October 2, 2006, sponsored by Netflix in order to improve its movie-matching algorithms. If you’re not yet familiar, Netflix is a company that specializes in direct-mail movie rental, meaning you simply ask for a list of movies online and Netflix will mail them straight to you. As you finish watching the movies, you mail them back, and Netflix sends the next one on your list.

One of the highlights of their service is their ‘Cinematch’ service, which will try to recommend movies to you based on your reported preferences over a short list of movies. The Netflix Prize focuses on finding an improvement to the Cinematch algorithm, offering a $1 million prize to any individual or group that manages to improve upon Cinematch by 10% or more, according to the root mean-squared (RMSE) error metric. If you will recall, if X is our estimator and Y is the estimand, then RMSE = Sqrt( (1/n) SUM( Yi-Xi )^2). If X is the true rating that some user gives for a particular movie (from 1 to 5), and Y is our predicted rating for that user-movie pair, then Cinematch’s RMSE is currently about 0.9525, and a winning submission would have to get below 0.8563.

So how do you actually go about winning a million dollars? What follows is a brief mathematical outline of the problem, and a couple of ways that we can use what’s taught in CS322 to attack it. You can essentially view Netflix’s problem as one of interpolating missing values in a very large matrix. Imagine a matrix with movies down the rows and columns for each user. Netflix has supplied maybe 1-2% of the values; based on those, fill in the rest of the matrix.

Well, let’s cheat and use SVD! How, you might ask? After all, your matrix is basically 99% empty, and not in the sense that the rest of the values are 0’s, but because they’re UNKNOWN. A simple way is as follows: Let’s say you want a rank-40 SVD of your 99%-unknown user-movie matrix. Well, initialize some matrices of the right size, and calculate the product. Then calculate the norm of the difference between the product and our original matrix, ignoring all unknown values. Then run a simple gradient descent algorithm. The details are in HOW you do your gradient descent, and how you initialize the original SVD matrices.

This is actually a very cheap and simple trick, but it was good enough for a Netflix prize contestant to beat Netflix’s Cinematch algorithm by a small margin already. Although the current leaderboard for the Netflix prize is far ahead of this approach, and brain-dead-simple SVD methods are no longer adequate,  it’s surprising how little work the decomposition takes to get ‘ok’ results.

http://alias-i.com/lingpipe/demos/tutorial/svd/read-me.html

http://sifter.org/~simon/journal/20061211.html

http://www.netflixprize.com/leaderboard

http://www.netflixprize.com/assets/ProgressPrize2007_KorBell.pdf

http://www.netflixprize.com/faq

http://en.wikipedia.org/wiki/Netflix_Prize

http://en.wikipedia.org/wiki/Mean_squared_error

Posted in Topics: Uncategorized

Jump down to leave a comment.

Leave a Comment

You must be logged in to post a comment.



* You can follow any responses to this entry through the RSS 2.0 feed.