Landsat Image: An Application of PCA to Image Processing and Statistics

The application of matrix factorization and singular value decomposition (SVD) to image processing that many posts have provided interesting examples of multidimensional, or multivariate data. Those discussions led me to look at the principle component analysis (PCA) technique, which has been addressed quite a few times but from different angles. It is also used to analyze multivariate data, the use of orthogonal diagonalization and the singular value decomposition. In particular, I’m going to take a look at an application related to image compression and dimension reduction of multivariate data known as Landsat program.

The Landsat program was inspired by Apollo moon bound missions. William Pecora proposed the idea of having remote sensing satellites in space in 1964, and his proposal became reality 8 years later. Landsat 1 was launched in 1972. Since then, six other satellites have been launched. Landsat 7 is the most recent satellite we have launched, providing us the most technological data possible. Landsat satellites are jointly managed by NASA and the U.S. Geological Survey. They have collected information about Earth from space by taking specialized digital photographs of Earth’s continents and surrounding coastal regions for over three decades. The landsat images are useful for many purposes, enabling people to study various aspects of our planet and to evaluate the dynamic changes caused by both natural processes and human practices. Here’s an example of Landsat 7 images - “Greek Fire Scars”:

greek-fire-scars.jpg
Image 1. The Landsat 7 images above are a composites of ETM+ bands 7, 4, and 2. This image was provided courtesy of USGS.
Source:  NASA Landsat Program Web site.

Sensors aboard the satellite acquire 7 simultaneous images of any region on earth to be studied. The sensors record energy from separate wavelength bands - 3 in visible light spectrum and 4 in infrared and thermal bands. Each image is digitized and stored as a rectangular array of numbers, each number indicating the signal intensity at a corresponding pixel on the image. Each of the 7 images is a channel of a multichannel image. Therefore, the 7 Landsat images of one fixed region typically contain much redundant information, since some features will appear in several images. Yet other features, because of their color or temperature, may reflect light that is recorded by only 1 or 2 sensors. One objective of multichannel image processing is to view the data in a way that extracts information better tan studying each image separately. Principal component analysis (PCA) serves as an effective way to accomplish this goal, suppressing redundant information and provide in only 1 or 2 composite images most of the information from initial data.

Roughly speaking, the objective is to find a special linear combination of the images, i.e. a list of weights that at each pixel combine all 7 corresponding image values into 1 new value. The weights are chosen in a way that makes the range of light intensities - the scene variance - in the composite image (i.e. the first principal component) greater than that in any of the original images. Additional component images can also be constructed, by criteria that will be addressed as follows:

For simplicity, assume that the matrix [X1 … XN] is in mean-deviation. We want to find an orthogonal p*p matrix P=[u1 … up] that determines a change of variables, X = PY, namely

xpy.jpg

with the property that new variables y1, y2, …, yp are uncorrelated and are arranged in order of decreasing variance.

The above orthogonal change means that each observation vector Xk receives a “new name” Yk such that Xk = P*Yk. Then Yk is the coordinate vector of Xk w.r.t. the columns of P: Yk=P-1X1 for k=1,…, N. We have verified in elementary linear algebra that for any orthogonal P, the covariance matrix of Y1, …, YN is PTSP. So the desired orthogonal matrix P is one that makes PTSP diagonal. Let D be a diagonal matrix consisting of the eigenvalues λ1,λ2,…,λp of S on the diagonal, arranged so that λ1≥λ2≥…≥λp≥0, and let P be an orthogonal matrix whose columns are the corresponding unit eigenvectors u1, u2,…,up. Then S=PDPT and PTSP=D.

 

In this case, the unit eigenvectors u1, u2, …, up of the covariance matrix S are the principal components of the data (in the matrix of observations). The first principal component is the eigenvector corresponding to the largest eigenvalue of S, the second principal component is the eigenvector corresponding to the second largest eigenvalue, and so on. The first principal component u1 determines the new variable y1 in the following way. Let c1, …, cp be the entries in u1. Since u1T is the first row of PT , the equation Y=PTX shows that y1=u1TX=c1×1+c2×2+…+cpxp. Hence y1 is a linear combination of the original variables x1,…, xp, using the entries in the eigenvector u1 as weights. In a similar fashion, u2 determines y2, and so on.

The variables x1, converted to a gray scale between black and white, would produce photograph of one channel, the value of x2 could produce photograph of the other channel, and so on. The following images display the first, second and third component respectively of a common set of data.

blog-image2.jpg

Image2. Source: PCA in Image Processing (Mudrová, Procházka).

 

 

On a more general level description, recall the steps of PCA:

- obtain data
- subtract the mean
- compute covariance matrix
- compute eigenvectors and eigenvalues of the covariance matrix
this is pivotal: eigenvector with highest eigenvalue is principle component - this tell us useful information about our data
- choose components and form a feature vector:
FeatureVector = (eig1 eig2 eig3 … eign)
- derive new data:
FinalData = transpose(FeatureVector)*transpose(MeanAdjustedData)

As with other techniques to image compression, by using PCA, once we find the patterns in the data, we will be able compress data by reducing the number of dimensions without losing much of the information. It is potentially valuable for applications in which most of the variation, or dynamic range in the data is due to variations in only a few of the new variables, y1, …, yp. In the orthogonal change shown in earlier paragraphs, since left-multiplication by P does not change the lengths of vectors or the angles between, X=PY does not change the total variance of the data. This implies that if S = PDPT, then [total variance of x1, x2,…,xp] = [total variance of y1, y2,…,yp] = tr(D) = λ1+λ2+…+λp.

The variance of yj is λj, and the various percentages of variance of data that are displayed in the principal component images, are measured by λj/tr(S), i.e. the proportion of total variance that is explained by yj.

 

 

Finally, we might realize the simplicity of PCA could also be its weakness: synthetic variables more complex than linear combination might result in more efficient model for data description. To extend from here, PCA can be generalized in many ways, mostly based on non linear transforms of the original variables. A few examples are:

Curvilinear Component Analysis(CCA) - it is more general, without knowing of data distribution before hand, allows non-Gaussian distributed data be accurately represented by using non-linear mappings.

Independent Components Analysis (ICA) - it generates new variables that not only uncorrelated as with PCA, but also genuinely independent.

PCA on latent variables - it describes data by linear combinations of a small number of unobserved variables.

References:

ACCESS. Generalizations of Principal Components Analysis. Retrieved from http://www.aiaccess.net

A tutorial on Principle Component Analysis. Retrieved from
http://www.cs.otago.ac.nz/cosc453/student_tutorials/principal_components.pdf

Landsat NASA Web site. http://landsat.gsfc.nasa.gov/

Lay, David. Linear Algebra and its Applications.

Landsat program from Wikipedia. Retrieved from

http://en.wikipedia.org/wiki/Landsat_program

Mudrová M. , Procházka, A. PCA in Image Processing. Retrieved from http://phobos.vscht.cz/konference_matlab/MATLAB05/prispevky/mudrova/mudrova.pdf

 

 

 

 

Posted in Topics: Uncategorized

Jump down to leave a comment.

Leave a Comment

You must be logged in to post a comment.



* You can follow any responses to this entry through the RSS 2.0 feed.