While working on project 2, I found much of my time was spent running my code on the later challenges over and over. To be able to get decent results, a lot of trial and error was required with respect to the parameter epsilon. Even with a good resolution and after narrowing epsilon down to a decent range, it was still a strain to make out what the light source images said. On a couple occasions, while sitting around and waiting for my code to finish running for the millionth time, I thought to myself “…there has to be a better way”.
I did end up thinking of one possibility for how things could have been a bit more automated, but ironically only as soon as I had finished figuring out the details of the challenge images. I found a font on my computer, called OCR A Std, which looked identical or at least nearly the same as the one used on the last two challenge images. This was an “aha” moment for me: OCR stands for Optical Character Recognition, a class of techniques used to allow computers to basically be able to “read” test off of an image. Wouldn’t it have been nice to have a program that could calculate light images for different values of epsilon, attempt to use an OCR algorithm to read the text of the image, and then refine its choice for epsilon based on the results and try again? Anybody could have let a program like that run overnight and woken up to nice results.
So, what goes into a good character recognition algorithm, anyway? There are two different main techniques that are used in optical character recognition. The first, more primitive way is for the computer to simply have a database of all possible characters, then look at the image and attempt to “parse” it into smaller segments, each containing a single character. The shape of the character could then be directly compared to each of the standard images, perhaps by subtracting the two images to find error like we did in the project. This technique has apparently been in use since at least the 60s, which is when the OCR A font was created for exactly for this purpose.
The other, more advanced form of OCR involves machine learning techniques. Instead of just being told what a canonical example of each letter looks like, the algorithm can actually figure it out, and continually refine its definition by trial and error. A set of rules which define the shape of each character can be modified to accommodate for a newly encountered character if the program makes a mistake in classifying it. This is more robust than the first method in the sense that it can adapt to different handwriting or font styles, for example. The first method would probably be sufficient only if the characters being read were of an already known standard font.
OCR techniques have gotten quite efficient recently, but realistically they might not work so well in the contrived circumstances of this project: there’s no text to “train” a good algorithm, and the text is probably blurrier than most algorithms can actually classify. A more realistic benefit of OCR would be the ability to read large amounts of text quickly, rather than being able to decipher very blurry text. Companies like Adobe and Microsoft have been using these ideas for a while to enable translating scanned images of documents into pages of searchable and editable text.
Here’s an article that discusses different aspects of the OCR process, as it relates to scanning and parsing paper documents: http://www.computerworld.com/softwaretopics/software/apps/story/0,10801,73023,00.html
An interesting applet that demonstrates how a system called a neural network is used to train an OCR algorithm: http://www.sund.de/netze/applets/BPN/bpn2/ochre.html






Leave a Comment
You must be logged in to post a comment.
* You can follow any responses to this entry through the RSS 2.0 feed.