Design of a discriminating fingerprint for G-protein-coupled receptors

Abstract
A systematic method for designing discriminating protein sequence fingerprints is described. The approach used is iterative, and diagnostic performance is evaluated in terms of the relative abilities of sequences to match with individual elements of the fingerprint. The method allows complete protein folds to be characterized in terms of a number of separate ‘features’, without the requirement to define specific intervals between them, and is described here with reference to the derivation of a fingerprint for G-protein-coupled receptors: this comprises the seven hydrophobic regions shown by protein chemistry approaches to be membranespanning.The fingerprint is potently diagnostic of all sequences of this type in the database in which it was derived (the OWL composite sequence database, version 8.1), and has continued to perform well on subsequent database updates, identifying 240 receptors in OWL17.0. Results are compared with a commonly used pattern template for this class of receptors. The investigation suggests that discriminating power is improved in the fingerprint approach because the recognition of individual features is made mutually conditional. Furthermore, by avoiding the definition of predetermined feature separations, members of protein families possessing all or only part of the fingerprint may be identified