Sidestepping somewhat from the field of Scientific Computing into statistics, I would like to present a small discussion as to why Least Squares is so widely used. Aside from it being easy to calculate and intuitively a good measure of best fit, there exist some really good properties about it.
The problem of linear regression is as follows: You want to model yi = B0 + B1xi1 + B2xi2 + …+ Bkxik + ui, where B0 to Bk are constant coefficients, xi1 to xik are your variables, ui is your residual term, and y is what you want to model, with the matrix equivalent being Y = XB + u. An example of this might be, say, we think salaries are affected by age and education, so maybe we might want to model salariesi = b0 + b1*agei + b2*educationi + ui. All the u terms in this case are called the errors and basically represent the fraction of salaries that are made up by factors other than age and education. Applying Least Squares on the observable data set of salaries, age and education gives us salaries^i = b0^ + b1^*agei + b2^*educationi, where b0^, b1^ and b2^ are estimates of b0, b1 and b2. The difference between this Least Squares fitted salaries^i and the real, observable salariesi is called the residual, u^i. We also know from the mechanics of the Least Squares method, that this is equivalent to finding the estimates b^i such that the residuals are minimized.
Without going too deeply into the math behind it, The Gauss Markov Theorem states that in a linear model (as in above) in which the errors (again, the ui terms) have expectation zero and are uncorrelated and have equal variances, a best linear unbiased estimator (BLUE) of the coefficients is given by the least-squares estimator. This means that, given the assumptions, the mean square error between the b^i terms and the true values of the bi terms is minimised, and that the expected values for the b^i terms is exactly equal to the bi terms. This theorem is also remarkably strong as it does not rely on the errors being normally distributed or even being of the same distribution. Additionally, the errors do not have to be independent - uncorrelation is a much weaker condition to be satisfied. In addition, even if the assumption of having the same variance (also called homoskedasticity) is not met, there are adjustments that one can make to the Least Squares method (such as General Least Squares) such that the estimates gotten are still BLUE.
For further reading:






Leave a Comment
You must be logged in to post a comment.
* You can follow any responses to this entry through the RSS 2.0 feed.