Optimizing Parameters of Numerical Methods

Lots of numerical methods have parameters (i.e. tolerance, momentum …). While we would want to optimize the parameters and not use the default values (i.e. in 322 project), it is often very difficult to optimize them.
In general, what people do is prepare a test set, run your numerical methods with different parameter values, and see at which value the method does the best (i.e. smallest error, largest efficiency …). Assuming that the test set represent the data space well, we would be able to optimize our parameters. However, when parameters are in real number, it’s not possible to try every possible value and find the best one. While we can test values discretely and achieve near-optimality, having to search the whole space is still cumbersome and, due to the curse of dimensionality, the search space will get larger exponentially as the number of parameters increases. To overcome these difficulties, we would have to come up with some schemes.
One of the commonly used scheme is hill climbing. While exhaustively searching the space is too cumbersome, this method requires much less number of evaluations. We assume that there is only one local maximum, which equals to the global maximum, start from a random value (or the center), and change the parameter(s) toward the direction in which the method does better, or “climb hill”. When there is one parameter, we can just increase or decrease it with small enough step size, and when there are many parameters, we can take the direction to which the gradient (or the change) is the largest. There are some variation to this method, such as varying the step size during the execution and restarting the hill climbing for multiple times in case we fall into non-global local maximum.
Even with hill climbing, it might require quite a number of evaluations and it would be problematic if each evaluation takes a long time: for example, rendering images for the previous images took a long time. In this case, we can spatially sample some representative points from the parameter-space, and build a model, such as k-nearest-neighbors or artificial-neural-net model, to predict the result without actually evaluating it. For example, we can evaluate 50 representative points, build a model from the points, and inquire the model 500 times to do hill climbing or even exhaustively search the parameter-space. While the parameters-result function does not behave abruptly, even with very high dimension, we can build a quite accurate model and find a nearly-optimal point with very small number of evaluations.

Reference:
1. Wikipedia – hill climbing

p.s. I completely forgot about the blogging assignment and got late for this third post - my apology…

Posted in Topics: Education

Jump down to leave a comment.

Leave a Comment

You must be logged in to post a comment.



* You can follow any responses to this entry through the RSS 2.0 feed.