Alignment of whole genomes
Open Access
- 1 January 1999
- journal article
- research article
- Published by Oxford University Press (OUP) in Nucleic Acids Research
- Vol. 27 (11), 2369-2376
- https://doi.org/10.1093/nar/27.11.2369
Abstract
A new system for aligning whole genome sequences is described. Using an efficient data structure called a suffix tree, the system is able to rapidly align sequences containing millions of nucleotides. Its use is demonstrated on two strains of Mycobacterium tuberculosis, on two less similar species of Mycoplasma bacteria and on two syntenic sequences from human chromosome 12 and mouse chromosome 6. In each case it found an alignment of the input sequences, using between 30 s and 2 min of computation time. From the system output, information on single nucleotide changes, translocations and homologous genes can easily be extracted. Use of the algorithm should facilitate analysis of syntenic chromosomal regions, strain-to-strain comparisons, evolutionary comparisons and genomic duplications.Keywords
This publication has 25 references indexed in Scilit:
- Tandem repeats finder: a program to analyze DNA sequencesNucleic Acids Research, 1999
- Genome Sequence of an Obligate Intracellular Pathogen of Humans: Chlamydia trachomatisScience, 1998
- Deciphering the biology of Mycobacterium tuberculosis from the complete genome sequenceNature, 1998
- Complete Sequence Analysis of the Genome of the Bacterium Mycoplasma PneumoniaeNucleic Acids Research, 1996
- The Minimal Gene Complement of Mycoplasma genitaliumScience, 1995
- Whole-Genome Random Sequencing and Assembly of Haemophilus influenzae RdScience, 1995
- Comparison of methods for searching protein sequence databasesProtein Science, 1995
- Rapid and Sensitive Protein Similarity SearchesScience, 1985
- Identification of common molecular subsequencesJournal of Molecular Biology, 1981
- A general method applicable to the search for similarities in the amino acid sequence of two proteinsJournal of Molecular Biology, 1970