Sesam: A relational database for structure and sequence of macromolecules

Abstract
A system is described that provides ways of integrating data on protein structure, sequence, and survey results, with molecular graphics and molecular mechanics software. Its major component is the relational database SESAM, presently implemented under the commercial package SYBASE. By desin, the database allows full integration—within the same data organization—of raw data on protein structure, sequence, ligands, and heterogroups, obtained from the Brookhaven Protein Databank, with pure sequence information available from other databanks such as SWISS-PROT. It contains in addition higher level descriptions of structural and topological properties, as well as survey results, obtained by executing specialized computer programs. Aside from the very useful attribute of closely combining structural and nonstructural information, other important features distinguish it from analogous systems developed elsewhere. It includes a molecular dictionary with complete description of geometric properties and energy parameters used in modeling and conformational energy calculations. Using this dictionary, structural data are validated by checking for localized inconsistencies in atomic coordinates, atomic symbols, chirality definitions, and flagging errors and incomplete entries. Because of both the dictionary and the validation procedures, SESAM can be readily interfaced with conventional molecular graphics and mechanics software packages, or with other specialized application programs. With the aid of appropriate interfaces, data access is sufficiently fast for SESAM to be interrogated interactively. Prototypes of user interfaces, as well as an interface with the molecular graphics package BRUGEL, are described and the power of the system is illustrated in applications such as homology-based protein modeling, computer-aided protein design, protein structure predictions, analysis of local structure motifs, and of relationships between protein sequence and structure.