Visual information processing: the structure and creation of visual representations

Abstract
For human vision to be explained by a computational theory, the first question is plain: What are the problems that the brain solves when we see? It is argued that vision is the construction of efficient symbolic descriptions from images of the world. An important aspect of vision is therefore the choice of representations for the different kinds of information in a visual scene. An overall framework is suggested for extracting shape information from images, in which the analysis proceeds through three representations: (1) the primal sketch, which makes explicit the intensity changes and local two-dimensional geometry of an image; (2) the 2 1/2-D sketch, which is a viewercentred representation of the depth, orientation and discontinuities of the visible surfaces; and (3) the 3-D model representation, which allows an object-centred description of the three-dimensional structure and organization of a viewed shape. The critical act in formulating computational theories for processes capable of constructing these representations is the discovery of valid constraints on the way the world behaves, that provide sufficient additional information to allow recovery of the desired characteristic. Finally, once a computational theory for a process has been formulated, algorithms for implementing it may be designed, and their performance compared with that of the human visual processor.