Conserved codon composition of ribosomal protein coding genes in Escherichia coli, Mycobacterium tuberculosis and Saccharomyces cerevisiae: lessons from supervised machine learning in functional genomics
Open Access
- 1 June 2002
- journal article
- research article
- Published by Oxford University Press (OUP) in Nucleic Acids Research
- Vol. 30 (11), 2599-2607
- https://doi.org/10.1093/nar/30.11.2599
Abstract
Genomics projects have resulted in a flood of sequence data. Functional annotation currently relies almost exclusively on inter-species sequence comparison and is restricted in cases of limited data from related species and widely divergent sequences with no known homologs. Here, we demonstrate that codon composition, a fusion of codon usage bias and amino acid composition signals, can accurately discriminate, in the absence of sequence homology information, cytoplasmic ribosomal protein genes from all other genes of known function in Saccharomyces cerevisiae, Escherichia coli and Mycobacterium tuberculosis using an implementation of support vector machines, SVMlight. Analysis of these codon composition signals is instructive in determining features that confer individuality to ribosomal protein genes. Each of the sets of positively charged, negatively charged and small hydrophobic residues, as well as codon bias, contribute to their distinctive codon composition profile. The representation of all these signals is sensitively detected, combined and augmented by the SVMs to perform an accurate classification. Of special mention is an obvious outlier, yeast gene RPL22B, highly homologous to RPL22A but employing very different codon usage, perhaps indicating a non-ribosomal function. Finally, we propose that codon composition be used in combination with other attributes in gene/protein classification by supervised machine learning algorithms.Keywords
This publication has 28 references indexed in Scilit:
- The Ribosome in FocusCell, 2001
- The Ribosome at Atomic ResolutionBiochemistry, 2001
- The Structural Basis of Ribosome Activity in Peptide Bond SynthesisScience, 2000
- Genomic BiologyCell, 2000
- Codon usage as a tool to predict the cellular location of eukaryotic ribosomal proteins and aminoacyl-tRNA synthetasesNucleic Acids Research, 1999
- COMPARATIVE DNA ANALYSIS ACROSS DIVERSE GENOMESAnnual Review of Genetics, 1998
- The Complete Genome Sequence of Escherichia coli K-12Science, 1997
- CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choiceNucleic Acids Research, 1994
- Multifunctional DNA-binding proteins mediate concerted transcription activation of yeast ribosomal protein genesBiochimica et Biophysica Acta (BBA) - Gene Structure and Expression, 1990
- Codon usage in bacteria: correlation with gene expressivityNucleic Acids Research, 1982