Gibbs motif sampling: Detection of bacterial outer membrane protein repeats

Abstract
The detection and alignment of locally conserved regions (motifs) in multiple sequences can provide insight into protein structure, function, and evolution. A new Gibbs sampling algorithm is described that detects motif‐encoding regions in sequences and optimally partitions them into distinct motif models; this is illustrated using a set of immunoglobulin fold proteins. When applied to sequences sharing a single motif, the sampler can be used to classify motif regions into related submodels, as is illustrated using helix‐turn‐helix DNA‐binding proteins. Other statistically based procedures are described for searching a database for sequences matching motifs found by the sampler. When applied to a set of 32 very distantly related bacterial integral outer membrane proteins, the sampler revealed that they share a subtle, repetitive motif. Although BLAST (Altschul SF et al., 1990, J Mol Biol 215:403–410) fails to detect significant pairwise similarity between any of the sequences, the repeats present in these outer membrane proteins, taken as a whole, are highly significant (based on a generally applicable statistical test for motifs described here). Analysis of bacterial porins with known trimeric β‐barrel structure and related proteins reveals a similar repetitive motif corresponding to alternating membrane‐spanning β‐strands. These β‐strands occur on the membrane interface (as opposed to the trimeric interface) of the β‐barrel. The broad conservation and structural location of these repeats suggests that they play important functional roles.