Greatest Squares Fitting

We’ve been talking about least squares fitting of functions to data points lately in lectures, and this reminded of the decision boundaries used in supervised machine learning. The two are similar in that they both try to “fit” a function (i.e. decision boundary for machine learning) to a given set of data points, though with obviously very different error functions. Our recent, torturous (just kidding ;)), homework problem 12.1.14 particularly reminded me of how decision boundaries are made in SVMs (Support Vector Machines), a particularly effective machine learning method. I thought it might be interesting to provide some info on how the analogy to “fitting” in least squares is actually the process of “learning” in supervised machine learning, and which for the case of SVMs is actually more like greatest squares.

The basic idea in supervised machine learning is about trying to “fit” a function or decision boundary to a training data set so that we can use it to accurately classify any new examples we may meet in the future. One simple way of trying to understand this is to think that we are trying to classify a group of examples with attributes x into one of two labels, which for simplicity we usually just call +1 and -1 (binary classification). Then we try to learn or fit a “good” hypothesis function h(x) using the training data set for which we already know each element’s classification. These hypothesis functions h(x) are the different decision boundaries (lines, planes, or hyperplanes actually depending on the dimensions) which basically decide how to classify a new example depending on which side of the boundary it is on. If the new example is on the positive side, classify it as +1, else -1. Exactly what +1 and -1 mean naturally depend upon the actual problem and the actual definition of a “good” h(x) is also a big question. One simple (though flawed) criteria by which to decide how “good” an h(x) is, is to simply try to find an h(x) which best fits the training data set, so as to minimize training error, or in other words, the error rate of h(x) when applied back onto the training data.

Hard SVM

The actual methods of how we learn h(x) are an entire field of study and CS478 is one of the whole courses on this subject (very interesting class btw, if your interested in this stuff). The SVMs I mentioned above is one of those methods. It basically involves trying to “fit” a decision boundary in between the positive(+) and negative(-) examples in the training data set while trying to maximize a margin between the two sides. This is done by trying to maximize the distance from the outliers of both positive and negative sides to the decision boundary itself. Then this distance is called the margin and the examples (points) which lie on the margins are called Support Vectors, hence the method’s name. And this is where the similarity to our homework problem appears (finally!) only this time, instead of trying to minimize the distance from points to the line/plane, we are trying to maximize them or greatest-squares fitting.

There’s a 5 minute crash course in machine learning :).

(Note: I don’t think greatest squares fitting is an actual official term)

Links:

CS478 Course Website: http://www.cs.cornell.edu/courses/cs478/2008sp/

Wikipedia Article: http://en.wikipedia.org/wiki/Support_vector_machine

Posted in Topics: Uncategorized

Jump down to leave a comment.

One response to “Greatest Squares Fitting”

  1. Экс-сотрудника ЮКОСа признали виновным в организации 3 убийств и 4 поку Says:

    […] Greatest Squares Fitting […]

Leave a Comment

You must be logged in to post a comment.



* You can follow any responses to this entry through the RSS 2.0 feed.