The system under development, VISIONS, is an investigation into general issues in the construction of computer vision systems. The goal is to provide an analysis of color images of outdoor scenes, from segmentation (or partitioning) of an image through the final stages of symbolic interpretation of that image. The output of the system is intended to be a symbolic representation of three-dimensional world depicted in the two-dimensional image, including the naming of objects, their placement in three-dimensional space, and the ability to predict from this representation the rough appearance of the scene from other points of view. Research in segmentation and interpretation has been separated into the development of two major subsystems with quite different methodologies and considerations. The focus of this paper is upon the interpretation system. The primary emphasis will be on the development of strategies by which several knowledge sources (KSs) can be integrated using expected knowledge stored in structure called 3D and 2D schemas, each of which may be general or specific to the scene under consideration. A series of increasingly more difficult experiments is outlined as an experimental methodology for developing schema- driven (e.g., top-down) control mechanisms; each succeeding experiment will assume a set of weaker constraints, representing image interpretation tasks where a decreasing amount of knowledge of the situation is available. Experimental results show current capabilities of a number of KSs and the effectiveness of a specific 2D schema in the interpretation of a scene.