In our latest project in class, we’re using singular value decomposition to infer a light source from a given light-blocking shape and the resulting shadow it casts on a paper surface. As soon as I read the project description, I was excited about working on this project – it’s almost like being a spy, using two very similar images and some clever math to uncover seemingly hidden data about how these images relate. After all, at first glance, it seems like it would be very difficult to calculate with great accuracy the exact light source that creates this shadow.
![]()
So as I pondered the mysterious and exciting possibilities that singular value decomposition offers in this problem, I began wondering what other cool information SVD could illuminate in other contexts. The technique is fairly simple in concept: it decomposes a complex matrix into a product of multiple, simpler matrices. This process can expose components and patterns that are not evident in the single product matrix. After some unsuccessful attemps to find the jargon-ish phrase “singular value decomposition” used in popular and interesting news articles, I instead ended up locating several academic papers that proposed some very different and interesting real-world applications of SVD.
There are all kinds of uses for singular value decomposition in modern mathematical contexts. Some of the places it pops up include numerical weather prediction, calculating metabolic pathways in microorganisms, digital watermarking, measuring apparent brightness temperatures of solar bursts in space, analysis of auscultatory sounds for diagnosis of diseases, and a lot more. However, I decided to focus on a topic that uses SVD that I’m sure all of us are familiar with: collaborative filtering.
Huh? “Collaborative filtering?”
Perhaps some of you receive periodic annoying emails from Netflix or Amazon with “your personal recommendations” for movies and products you may enjoy based on what you’ve bought and how you’ve rated your previous purchases. Little did you know that before that email gets sent out, a whole lot of SVD math has to happen first!
Collaborative filtering works by analyzing the preferences of an individual user, comparing that user’s preferences with individuals with similar preferences, and predicting additional products or services in which the user might be interested. For example, if I log on to Netflix and add Indiana Jones, ET, and Jaws to my movie queue, I’ll probably get a pesky email within the week suggesting that I give Jurassic Park a try as well, since most other Spielberg fans probably rated all four movies highly.
Most collaborative filtering algorithms group users into “neighborhoods” by similarity of preferences, and this creates a much larger data pool to use for preference calculation than solely the data provided by the individual user. So we can think of each user as a point in n-dimensional space, plotted by their numerical preferences about a subset of all distinct products. A little thought and common sense indicate that before the algorithm can recommend products to a user with any reliable accuracy, a user must specify or rate a few different products first. (If I only rent Star Wars IV - VI, should Netflix recommend other sci-fi movies, other movies directed by George Lucas, other movies filmed in part in Marin County, CA, other movies starring Carrie Fisher, or other trilogies? There’s not enough information for the algorithm to make reliable guesses until I give it more data.)
So how does SVD come into play? One popular way to simplify the calculations and increase accuracy is to represent each preference as a linear combination of that product’s characteristics, as my examples suggest, and which is the very foundation of SVD. (So perhaps Star Wars is 70% sci-fi, 40% Carrie Fisher, 100% George Lucas, 10% Marin County, etc.) After I rent a few more movies, the collaborative filtering algorithm will have some very helpful information about what traits of movies I really like: so if I add The Matrix, Blade Runner, and 2001: A Space Odyssey to my movie queue, the SVD of my matrix of preferences about movie traits will have a very high value in the coordinate representing the sci-fi genre. The beauty of this system of decomposing preferences into characteristics means that the algorithm is able to acquire a lot of informative data about what a user likes with relatively little information. (There are a lot fewer users that have rated all four of the particular sci-fi movies I listed above than there are users who have rated four movies tagged as “sci-fi,” so using the feature method works more quickly and accurately.)

So now for some raw math. Michael H. Pryor of Dartmouth College describes the math involved in his paper, “The effects of singular value decomposition on collaborative filtering.” User rating vectors are stored in an m x n matrix A, where there are m users and n products. Singular value decomposition factors A into USVT, U and V orthogonal matrices, and S a diagonal matrix containing the singular values of A. U represents each user’s responses to certain features of products; V represents how much of each feature is present in a given product. The S matrix helps relate feature importance to the overall determination of a rating.
So using this math, a rating of any product can be calculated as follows…
A = USVT
Aij = Σk Σl Uik Skl (VT)jl
Aij = Σk Σl Uik Skl Vjl
… where Uik is user i’s preference for feature k, Skk is the importance of feature k in determining a rating for product j, and Vjl the amount of feature k present in product j. Since Sij = 0 whenever i ≠ j, we can simplify to the following equation…
Aij = Σk Uik Skk Vjk
… and so given the corpus of data for all users and some information about a particular user, it is fairly straightforward to calculate the likelihood that a given user will enjoy a given product or service based on the patterns of preferences of the entire population. This is merely an introduction to the math and computational theory behind the ever-growing industry of online individual user recommendations, but if you are interested in more details, see Pryor’s paper (the first link under “collaborative filtering,” below).
————————-
SOURCES:
SVD:
http://alias-i.com/lingpipe/demos/tutorial/svd/read-me.html
http://en.wikipedia.org/wiki/Singular_value_decomposition
Collaborative filtering:
http://www.cs.dartmouth.edu/reports/TR98-338.pdf
http://ieeexplore.ieee.org/iel5/10218/32584/01524053.pdf
http://www.igvita.com/2007/01/15/svd-recommendation-system-in-ruby/
Numerical weather prediction:
http://www.weatherchaos.umd.edu/papers/danforth-kalnay-svd.pdf
http://en.wikipedia.org/wiki/Numerical_weather_prediction
Metabolic pathways in microorganisms:
http://findarticles.com/p/articles/mi_qa3938/is_200302/ai_n9182203
Digital watermarking:
http://www.actapress.com/PaperInfo.aspx?PaperID=31036&reason=500
Solar bursts in space:
http://ntrs.nasa.gov/search.jsp?R=934994&id=5&qs=Ne%3D25%26N%3D49%2B126
Auscultatory sounds:
http://www.patentstorm.us/patents/7300405-fulltext.html






Leave a Comment
You must be logged in to post a comment.
* You can follow any responses to this entry through the RSS 2.0 feed.