What constitutes a scene and how do you define a meaningful vocabulary for a scene? This is a very challenging problem that has immense importance in the field of object recognition. The following video talks about using the Max Margin Model to obtain new annotations for different images using existing annotations for other images.
http://www.youtube.com/watch?v=OLV1IIpTPow
The basic idea is that we start with matrices Y and X that contain the annotations and features respectively, and a linear classifier matrix W. W is then decomposed into a matrix F that maps the feature space, and another linear classifier G. We can then use F to make a transformation from the image space, where visually similar objects are close and visually dissimilar objects are far to the Latent Space where correlated objects are close together. The matrix G is then used to transform the latent space into the word classifier that adds new annotations for each image.






Leave a Comment
You must be logged in to post a comment.
* You can follow any responses to this entry through the RSS 2.0 feed.