Interacting with computers using images for search and automation

TitleInteracting with computers using images for search and automation
Publication TypeTheses
Year of Publication2009
AuthorsTom Yeh
Date Published2009///
UniversityMassachusetts Institute of Technology
CityCambridge, MA, USA
Abstract

A picture is worth a thousand words. Images have been used extensively by us to interact with other human beings to solve certain problems, for example, showing an image of a hind to a bird expert to identify its species or giving an image of a cosmetic product to a husband to help purchase the right product. However, images have been rarely used to support similar interactions with computers.In this thesis, I present a series of useful applications for users to interact with computers using images and develop several computer vision algorithms; necessary to support such interaction. On the application side, I examine two functional roles of images in human-computer interactions: search and automation. For search, I develop systems for users to obtain useful information about a location or a consumer product by taking its picture using a camera phone, to search online documentation about a GUI by taking its screenshot, and to ask general questions using pictures in a community-based QA service. For automation, I design a visual scripting system to allow end-users insert screenshots of GUI elements directly into program statements.
On the computer vision side, I describe the Adaptive Vocabulary Tree algorithm for indexing and searching a large and dynamic collection of images, the Dynamic Visual Category Learning algorithm for training and updating a set of dynamically changing object categories, the Vocabulary Tree SVM algorithm for fast object recognition by approximating the margins of a set of SVM classifiers efficiently, and the Multiclass Brand-and-Bound Window Search algorithm for simultaneously estimating the optimal location and label of an object in a large input image.
Finally, I demonstrate the usability of each proposed application with user studies and the technical performance of each algorithm with series of experiments with large datasets.