As we learned in class and as was mentioned in a previous post on BLAS, there are three levels to the standard. Level 1 was written in the early 70s (though only published in ‘79) and provides vector operations, Level 2 was created in the mid 1980s and cover matrix-vector ops, and Level 3 was made in the late 80s, covering matrix-matrix ops.
Curious to view some performance results, I found the following charts showing ATLAS (the auto-optimizing open version of BLAS) performance on two BLAS functions (click thumbnails for full-size):
DGEMM (BLAS 3): Double precision General Matrix-Matrix Multiplication
DGEMV (BLAS 2): Double precision General Matrix-Vector multiplication
At first, the dip in the performance on BLAS 2 DGEMV would seem surprising. However, Stefan Jaschke, the author of the benchmark description, noted that this dip is caused by the matrix being unable to be stored in L2 cache. Hence, the performance was “memory-bound”, limited not by the processor speed but by the memory bandwidth.
Also notable is the speed of operations for the BLAS 3 test over the BLAS 2 test. This mainly has to do with the DGEMM function versus DGEMV. DGEMM works on 2 matrices where DGEMV operates on a matrix and a vector, so there is a factor of n more values on which to operate in the use of DGEMM, where n is the size of the matrix.
Also, as noted by Per-Olof Persson, BLAS 3 implementations are usually created to be cache-aware, so they use memory more efficiently than BLAS 2 operations: 
(Image taken from his lecture notes, linked below)
This cache-efficiency in BLAS 3 may be the reason why there is no dip in the performance for DGEMM above. Though it is handling as much as or more data than DGEMV, the BLAS 3 implementation is limited by processing, not memory bandwidth, due to better cache performance.
Resources:
ATLAS BLAS Benchmarks by Stefan Jaschke (2002)
BLAS Lecture by Per-Olof Persson (2006)






Leave a Comment
You must be logged in to post a comment.
* You can follow any responses to this entry through the RSS 2.0 feed.