Informatics for Unveiling Hidden Genome Signatures

1 April 2003

journal article
Published by Cold Spring Harbor Laboratory in Genome Research

Vol. 13 (4), 693-702
https://doi.org/10.1101/gr.634603

Abstract

With the increasing amount of available genome sequences, novel tools are needed for comprehensive analysis of species-specific sequence characteristics for a wide variety of genomes. We used an unsupervised neural network algorithm, a self-organizing map (SOM), to analyze di-, tri-, and tetranucleotide frequencies in a wide variety of prokaryotic and eukaryotic genomes. The SOM, which can cluster complex data efficiently, was shown to be an excellent tool for analyzing global characteristics of genome sequences and for revealing key combinations of oligonucleotides representing individual genomes. From analysis of 1- and 10-kb genomic sequences derived from 65 bacteria (a total of 170 Mb) and from 6 eukaryotes (460 Mb), clear species-specific separations of major portions of the sequences were obtained with the di-, tri-, and tetranucleotide SOMs. The unsupervised algorithm could recognize, in most 10-kb sequences, the species-specific characteristics (key combinations of oligonucleotide frequencies) that are signature features of each genome. We were able to classify DNA sequences within one and between many species into subgroups that corresponded generally to biological categories. Because the classification power is very high, the SOM is an efficient and fundamental bioinformatic strategy for extracting a wide range of genomic information from a vast amount of sequences. [Supplemental material is available online atwww.genome.org.]

Keywords

This publication has 27 references indexed in Scilit:

Genome-Scale Compositional Comparisons in Eukaryotes
Genome Research, 2001
Compositional bias in DNA
Current Opinion in Genetics & Development, 2000
Engineering applications of the self-organizing map
Proceedings of the IEEE, 1996
Codon usage in the Mycobacterium tuberculosis complex
Microbiology, 1996
Codon usage and genome evolution
Current Opinion in Genetics & Development, 1994
Evidence for horizontal gene transfer in Escherichia coli speciation
Journal of Molecular Biology, 1991
The self-organizing map
Proceedings of the IEEE, 1990
THE ISOCHORE ORGANIZATION OF THE HUMAN GENOME
Annual Review of Genetics, 1989
Global variation in G + C content along vertebrate genome DNA
Journal of Molecular Biology, 1988
Self-organized formation of topologically correct feature maps
Biological Cybernetics, 1982

Cited by 215 articles