Identifying bacterial genes and endosymbiont DNA with Glimmer
Top Cited Papers
Open Access
- 19 January 2007
- journal article
- research article
- Published by Oxford University Press (OUP) in Bioinformatics
- Vol. 23 (6), 673-679
- https://doi.org/10.1093/bioinformatics/btm009
Abstract
Motivation: The Glimmer gene-finding software has been successfully used for finding genes in bacteria, archæa and viruses representing hundreds of species. We describe several major changes to the Glimmer system, including improved methods for identifying both coding regions and start codons. We also describe a new module of Glimmer that can distinguish host and endosymbiont DNA. This module was developed in response to the discovery that eukaryotic genome sequencing projects sometimes inadvertently capture the DNA of intracellular bacteria living in the host. Results: The new methods dramatically reduce the rate of false-positive predictions, while maintaining Glimmer's 99% sensitivity rate at detecting genes in most species, and they find substantially more correct start sites, as measured by comparisons to known and well-curated genes. We show that our interpolated Markov model (IMM) DNA discriminator correctly separated 99% of the sequences in a recent genome project that produced a mixture of sequences from the bacterium Prochloron didemni and its sea squirt host, Lissoclinum patella. Availability: Glimmer is OSI Certified Open Source and available at http://cbcb.umd.edu/software/glimmer Contact:adelcher@umiacs.umd.eduKeywords
This publication has 23 references indexed in Scilit:
- Defining Genes in the Genome of the Hyperthermophilic Archaeon Pyrococcus furiosus : Implications for All Microbial GenomesJournal of Bacteriology, 2005
- Large-scale prokaryotic gene prediction and comparison to genome annotationBioinformatics, 2005
- Universal biases in protein composition of model prokaryotesProteins-Structure Function and Bioinformatics, 2005
- Accuracy improvement for identifying translation initiation sites in microbial genomesBioinformatics, 2004
- Phylogenomics of the Reproductive Parasite Wolbachia pipientis wMel: A Streamlined Genome Overrun by Mobile Genetic ElementsPLoS Biology, 2004
- A Whole-Genome Assembly of DrosophilaScience, 2000
- Improved microbial gene identification with GLIMMERNucleic Acids Research, 1999
- Heuristic approach to deriving models for gene findingNucleic Acids Research, 1999
- Comparing the predicted and observed properties of proteins encoded in the genome of Escherichia coli K‐12Electrophoresis, 1997
- Recognition of genes in DNA sequence with ambiguitiesBiosystems, 1993