Microbial gene identification using interpolated Markov models
Open Access
- 1 January 1998
- journal article
- research article
- Published by Oxford University Press (OUP) in Nucleic Acids Research
- Vol. 26 (2), 544-548
- https://doi.org/10.1093/nar/26.2.544
Abstract
This paper describes a new system, GLIMMER, for finding genes in microbial genomes. In a series of tests on Haemophilus influenzae, Helicobacter pylori and other complete microbial genomes, this system has proven to be very accurate at locating virtually all the genes in these sequences, outperforming previous methods. A conservative estimate based on experiments on H.pylori and H.influenzae is that the system finds >97% of all genes. GLIMMER uses interpolated Markov models (IMMs) as a framework for capturing dependencies between nearby nucleotides in a DNA sequence. An IMM-based method makes predictions based on a variable context; i.e., a variable-length oligomer in a DNA sequence. The context used by GLIMMER changes depending on the local composition of the sequence. As a result, GLIMMER is more flexible and more powerful than fixed-order Markov methods, which have previously been the primary content-based technique for finding genes in microbial DNA.Keywords
This publication has 10 references indexed in Scilit:
- Regeneration of adult axons in white matter tracts of the central nervous systemNature, 1997
- The complete genome sequence of the gastric pathogen Helicobacter pyloriNature, 1997
- Microbial Pathogenesis: Genomics and BeyondScience, 1997
- The Power of Amnesia: Learning Probabilistic Automata with Variable Memory LengthMachine Learning, 1996
- Detection of new genes in a bacterial genome using Markov models for three gene classesNucleic Acids Research, 1995
- Whole-Genome Random Sequencing and Assembly of Haemophilus influenzae RdScience, 1995
- Comparison of methods for searching protein sequence databasesProtein Science, 1995
- GENMARK: Parallel gene recognition for both DNA strandsComputers & Chemistry, 1993
- Basic Local Alignment Search ToolJournal of Molecular Biology, 1990
- A universal data compression systemIEEE Transactions on Information Theory, 1983