CLSWeb Main
Caltech Library System
Electronic Theses
                  About | Browse | Search | Caltech Student Instructions

Peters, Robert Jacob (2004-06-04) Visual attention and object categorization: from psychophysics to computational models. http://resolver.caltech.edu/CaltechETD:etd-06062004-213811


Type of Document Dissertation
Author Peters, Robert Jacob
Author's Email Address rjpeters AT klab.caltech.edu
URN etd-06062004-213811
Persistent URL http://resolver.caltech.edu/CaltechETD:etd-06062004-213811
Title Visual attention and object categorization: from psychophysics to computational models
Degree PhD
Option Computation and Neural Systems
Advisory Committee
Advisor Name Title
Christof Koch Committee Chair
Laurent Itti Committee Member
Pietro Perona Committee Member
Richard Andersen Committee Member
Shin Shimojo Committee Member
Keywords
  • computational models
  • multidimensional scaling
  • eye tracking
  • visual attention
  • saliency
  • visual object categorization
Date of Defense 2004-06-04
Availability unrestricted
Abstract
This thesis is arranged in two main parts. Each part relies an approach using the methods of psychophysics and computational modeling to bring abstract or high-level theories of vision closer to a concrete neurobiological foundation.

The first part addresses the topic of visual object categorization. Previous studies using high-level models categorization have left unresolved issues of neurobiological relevance, including how features are extracted from the image and the role played by memory capacity in categorization performance. We compared the ability of a comprehensive set of models to match the categorization performance of human observers while explicitly accounting for the models' numbers of free parameters. The most successful models did not require a large memory capacity, suggesting that a sparse, abstracted representation of category properties may underlie categorization performance. This type of representation--different from classical prototype abstraction--could also be extracted directly from two-dimensional images via a biologically plausible early vision model, rather than relying on experimenter-imposed features.

The second part addresses visual attention in its bottom-up, stimulus-driven form. Previous research showed that a model of bottom-up visual attention can account in part for the spatial positions of locations fixated by humans while free-viewing complex natural and artificial scenes. We used a similar framework to quantify how the predictive ability of such a model may be enhanced by new model components based on several specific mechanisms within the functional architecture of the visual system. These components included richer interactions among orientation-tuned units, both at short-range (for clutter reduction) and at long-range (for contour facilitation). Subjects free-viewed naturalistic and artificial images while their eye movements were recorded. The resulting fixation locations were compared with the models' predicted salience maps. We found that each new model component was important in attaining a strong quantitative correspondence between model and behavior. Finally, we compared the model predictions with the spatial locations obtained from a task that relied on mouse clicking rather than eye tracking. As these models become more accurate in predicting behaviorally-relevant salient locations, they become useful to a range of applications in computer vision and human-machine interface design.

Files
  Filename       Size       Approximate Download Time (Hours:Minutes:Seconds) 
 
 28.8 Modem   56K Modem   ISDN (64 Kb)   ISDN (128 Kb)   Higher-speed Access 
  rjpeters_thesis_2004.pdf 20.99 Mb 01:37:11 00:49:59 00:43:44 00:21:52 00:01:51

Browse All Available ETDs by ( Author | Option )

If you have more questions or technical problems, please Contact the Caltech Library System.