An account of my recent CS322-induced wiki-clicking session:
Recently we have been spending a good part of our time together exploring various ways to decompose a matrix. We have found many ways to factor matrices, including
QR: 
Cholesky: 
and LU: 
All of which are mostly used to solve linear systems of equations,
.
More recently, we have been using a alternate matrix decomposition technique to solve equations of another form, called singular value decomposition, or equations of the form

where sigma is the singular value and u and v are the left-singular and right-singular vectors. The decomposition of M in SVD looks like

where Σ contains the singular values and U and V contain the singular vectors. You may recognize the singular value equations, as they are in fact similar to eigenvalue decomposition, or spectral decomposition,

where both SVD and eigenvalue problems involve finding basis directions for which scalar multiplication is equivalent to matrix multiplication (SVD is more general since it does not require that the matrix be square).
But what if we said that we have had enough with matrix decomposition? What if we wanted to solve an eigenvalue problem without doing any matrix decomposition? Then I would say look no further than power iteration. Power iteration is an excellent way to compute the largest eigenvalue of a very large sparse matrix, since it does not require any decomposition and relies solely on an iterative process

Eventually, these b vectors will slowly converge to an eigenvector which can be used to calculate the dominant eigenvalue via

Ok, but why in the world would anyone want to find only one eigenvalue of a very large and sparse matrix using a slow iterative method? You do, every time you Google something.
PageRank as you may or may not be aware is a probability distribution of the likelihood of randomly browsing to a website used by Google to measure a website’s relative importance. This eventually influences how the websites are listed on a Google search. It essentially a measure of how important other sites view the site in question by looking at how often other sites link to it.
The algorithm itself is an iterative method (like power iteration) that can be applied to a collection of documents of any size (like a large, sparse, non-square matrix) to arrive at a singular dominant value (like an eigenvalue). By looking at a simplified representation of the PageRank algorithm one can see how the iterative power method is put to good use for your web-browsing pleasure:

where PR(u) is the PageRank of website u, Bu is theset of all sites that link to u, and L(v) is the number of links from page v.
On an interesting (CS322 related) note, a group of people over at PhysicsBits have taken the estimation that PageRank is somehow related to the log-base-10 of the number of inward-linking sites and done some least squares fitting of

to some sample data and determined that a~1.12, as can be seen in the chart below:

Finally, there are a few alternatives to PageRank, one of particular interest being the HITS (Hypertext Induced Topic Selection) algorithm developed by Cornell’s own Jon Kleinberg. HITS is a link analysis algorithm (like PageRank) that determines two values (different from PageRank) that measure the value of the content of the page (authority) and the value of its links to OTHER pages (hub value).
Like PageRank, the algorithms for determining the values of interest are determined via iterative (and in HITS case mutually recursive) methods. Unlike PageRank, HITS is executed at query time (instead of at index time) and is intended for use on a small subset of ‘relevant’ documents (instead of the entire web). Because of the conceptual similarity between PageRank and HITS, it has been suggested by some that Kleinberg’s work was actually an inspiration to Page and Brin for their PageRank concept.
So the next time you see the ‘Rebel King’ Jon Kleinberg or , more likely his little brother the ‘Rebel Prince’ Robert Klienberg since most of us take his class right before 322, let him know that in a parallel universe he is really the ‘Rebel King’ of Silicon Valley and rolling in the billions.
Sources:
http://en.wikipedia.org/wiki/Matrix_decomposition
http://en.wikipedia.org/wiki/Power_method
http://en.wikipedia.org/wiki/Page_rank
http://www.sciencebits.com/OnGooglePR
http://en.wikipedia.org/wiki/HITS_algorithm






Leave a Comment
You must be logged in to post a comment.
* You can follow any responses to this entry through the RSS 2.0 feed.