Detecting Coevolution in and among Protein Domains

Open Access

2 November 2007

journal article
research article
Published by Public Library of Science (PLoS) in PLoS Computational Biology

Vol. 3 (11), e211-2134
https://doi.org/10.1371/journal.pcbi.0030211

Abstract

Correlated changes of nucleic or amino acids have provided strong information about the structures and interactions of molecules. Despite the rich literature in coevolutionary sequence analysis, previous methods often have to trade off between generality, simplicity, phylogenetic information, and specific knowledge about interactions. Furthermore, despite the evidence of coevolution in selected protein families, a comprehensive screening of coevolution among all protein domains is still lacking. We propose an augmented continuous-time Markov process model for sequence coevolution. The model can handle different types of interactions, incorporate phylogenetic information and sequence substitution, has only one extra free parameter, and requires no knowledge about interaction rules. We employ this model to large-scale screenings on the entire protein domain database (Pfam). Strikingly, with 0.1 trillion tests executed, the majority of the inferred coevolving protein domains are functionally related, and the coevolving amino acid residues are spatially coupled. Moreover, many of the coevolving positions are located at functionally important sites of proteins/protein complexes, such as the subunit linkers of superoxide dismutase, the tRNA binding sites of ribosomes, the DNA binding region of RNA polymerase, and the active and ligand binding sites of various enzymes. The results suggest sequence coevolution manifests structural and functional constraints of proteins. The intricate relations between sequence coevolution and various selective constraints are worth pursuing at a deeper level. The sequences of different components within and across genes often undergo coordinated changes in order to maintain the structures or functions of the genes. Identifying the coordinated changes—the “coevolution”—of those components in the context of evolution is important in predicting the structures, interactions, and functions of genes. The authors incur a large-scale screening on all the known protein sequences and build a compendium about the coevolving relations of all protein domains—subunits of proteins. The majority of the coevolving protein domains either belongs to the same proteins, appears in the same protein complexes, or shares the same functional annotations. Furthermore, coevolving positions in the same proteins or protein complexes are spatially coupled, as they tend to be closer than random positions in the 3-D structures of the proteins/protein complexes. More strikingly, many coevolving positions are located at functionally important sites of the molecules. The results provide useful insights about the relations between sequence evolution and protein structures and functions.

Keywords

This publication has 62 references indexed in Scilit:

Specificity in protein interactions and its relationship with sequence diversity and coevolution
Proceedings of the National Academy of Sciences, 2007
Identification and Classification of Conserved RNA Secondary Structures in the Human Genome
PLoS Computational Biology, 2006
Predicting Functional Gene Links from Phylogenetic-Statistical Analyses of Whole Genomes
PLoS Computational Biology, 2005
Multiple sequence alignment with the Clustal series of programs
Nucleic Acids Research, 2003
Two crystal structures demonstrate large conformational changes in the eukaryotic ribosomal translocase
Nature Structural & Molecular Biology, 2003
The 2.0Å Resolution Structure of the Catalytic Portion of a Cyanobacterial Membrane-bound Manganese Superoxide Dismutase
Journal of Molecular Biology, 2002
Non–coding RNA genes and the modern RNA world
Nature Reviews Genetics, 2001
The Protein Data Bank
Nucleic Acids Research, 2000
High resolution crystal structure of a Mg2+-dependent porphobilinogen synthase
Journal of Molecular Biology, 1999
Evolutionary trees from DNA sequences: A maximum likelihood approach
Journal of Molecular Evolution, 1981

Cited by 137 articles