Reference Article:
http://scitation.aip.org/getpdf/servlet/GetPDFServlet?filetype=pdf&id=SJOCE3000008000006000899000001&idtype=cvips&prog=normal
(Note: The article is a little beyond my math backgroud, but I will do my best to summerize the results.)
After completing Project 1, and working on project 2, it’s become painfully obvious that Matlab suffers from an inability to run on multiple cores. With todays computers, 2 processors is a standard, with many people having 4. Within the next year or two, Intel is expected to release an 8 core processor, one example being the Xeon X5365 V8. But, how do we use these 8 cores to maximize speed? For intense computations, being able to share the work among 2, 4, or even 8 processors can greatly increase the speed of the computation.
Research has been done for along time to find ways to speed up intense calculations. In the already mentioned article, QR factorizations can be performed quickly, and the final computation to find the QR matrix can be done without linear products. Without linear products, there is a distinct advantage to complicating problems which lead to simple computations to produce the desired result.
The first step is to have a Generator of A, G_alpha(A), which is defined as a matrix pair X = (x_1, x_2, …, x_alpha) , Y = (y_1, y_2, … , y_alpha) where {x_i, y_i} are vectors that detemine A. Alpha corresponds to length of the generator G_alpha(A). QR factors can be found by computing G(T) = {V,W} given G(T’T) = {W,WE} (T is a specific type of matrix in the paper, instead of using A). Following some math I dont understand, Theorem 1 produces a QR factorization. Another result of the ability to find QR factors is that it can also find LDL’ (of nonsingular symmetric matricies, i.e. matricies that have an inverse).
Algorithms are given to find QR factors. The promising part is that once the generator of A, G_alpha(A) is calculated, no significant calculations are required to find Q and R. Thus, after the compuation, Q and R can be tasked off to different processors to be computed. The running time of the algorithm mentioned is as follows:
Computing the generator is: 2m + mn + 1 flops
Computing R: 6n^2 - 3n - 3 flops
Computing Q: 12m(n-1) flops
Thus, not only can QR factorization be done in O(n^2) time, it can also be parallelized across multiple processors for faster computation.
As computer scientists today, there is an increasing need to take typical algorithms and provide algorithms that can run on multiple processors to help speed up performance. Even though Amdahl’s Law (http://en.wikipedia.org/wiki/Amdahl’s_law) states that we cannot find an exact correlation between number of processors and speedup of the algorithm, speeding up our algorithms 15~20% makes a significant difference.






Leave a Comment
You must be logged in to post a comment.
* You can follow any responses to this entry through the RSS 2.0 feed.