Today in class we began to talk about the method of Least Squares, a technique used to approximate a function that satisfies some some (noisy) given data. The idea of the algorithm is relatively straightforward: find the best function that minimizes the sum of the squares of the error of each point, given some basis function(s) as a starting point. However, several concerns quickly come into play here:
1. How do we know we’re using proper basis functions? If we’re plotting something that has a clearly defined expected behavior, then it’s easy; when performing an experiment in physics class that measures the force of an object at different accelerations, we can recall the formula F=ma to realize that the relationship between F and a should hopefully be a linear one. However, this may sometimes require a bit of intuition. Some of these formulas were probably originally derived empirically, so how did the original experimenters know what to use? One important fact to consider is that a surprising number of things in nature follow relatively simple rules or patterns. If the data seem to make a fairly simple shape such as a line or a parabola, there’s a good chance the relationship is indeed linear or quadratic, respectively. A good rule of thumb here is Occam’s Razor - when in doubt, pick the simplest function that approximates the data well. For example, a line with fairly small error is more likely to be a closer fit to the target function than, say, an interpolated polynomial that fits every data point perfectly (prepare for some wild oscillations and crazy predictions if you go this route).
2. Do we have enough sample points from a large enough range? I refer immediately to the second example given in the book on page 499. The question proposes, “Suppose, for instance, that we want to fit a table of values (xk,yk), where k = 0, 1, …, m, by a function of the form y = aln(x) + bcos(x) + ce^(x) in the least-squares sense.” They then provide a table of ten x-y pairs to which the function should be fit. Now, clearly this is just an example meant to demonstrate how the calculations work. However, in practice it would be awfully risky to assume a function of this form without very good reason. The resulting plot shows some strange sinusoidal-ish curve that, admittedly, seems to fit the data quite nicely. Regardless, I can propose a function of my own that fits the points almost equally well: a line (especially if you take the leftmost point to be an outlier). To be sure which was better to use, we’d probably want more data points from both within and outside the current range. How would we estimate the value of the function out at x = 10, for example? Their exponential estimate predicts a value of 676, while my linear estimate would predict something like 7.
3. Should we even be using the method of Least Squares? One might argue that my previous example proves that it’s a bad idea to use the method of Least Squares when not a lot of data is available. I propose that it’s still a useful tool, though we may just not be as confident in our fit as we would with more points. If we really want the best function we can get with the given data, Least Squares still seems to be one of the best ways to do so. The method definitely does have its shortcomings, though. In particular, it’s quite sensitive to outliers - even one point whose value differs from the general trend by a considerable amount will have an effect on the final function proportional to, you guessed it, the square of that error. One class of variation of Least Squares, called robust regression, attempts to amend this. Several different methods have been devised recently, but there is no one obvious algorithm that can solve the outlier problem.
An interesting source which sparked some of the issues I brought up here is the book “Practical Least Squares” by Ora Miner Leland, which I stumbled upon while searching google for information. Although it was published in 1921 (by a former Cornell professor, incidentally), it brings up some good points about where to use Least Squares and where to be cautious, particularly in the “Conclusion” chapter.






Leave a Comment
You must be logged in to post a comment.
* You can follow any responses to this entry through the RSS 2.0 feed.