Score-based prediction of genomic islands in prokaryotic genomes using hidden Markov models
Open Access
- 16 March 2006
- journal article
- Published by Springer Nature in BMC Bioinformatics
- Vol. 7 (1), 142
- https://doi.org/10.1186/1471-2105-7-142
Abstract
Horizontal gene transfer (HGT) is considered a strong evolutionary force shaping the content of microbial genomes in a substantial manner. It is the difference in speed enabling the rapid adaptation to changing environmental demands that distinguishes HGT from gene genesis, duplications or mutations. For a precise characterization, algorithms are needed that identify transfer events with high reliability. Frequently, the transferred pieces of DNA have a considerable length, comprise several genes and are called genomic islands (GIs) or more specifically pathogenicity or symbiotic islands. We have implemented the program SIGI-HMM that predicts GIs and the putative donor of each individual alien gene. It is based on the analysis of codon usage (CU) of each individual gene of a genome under study. CU of each gene is compared against a carefully selected set of CU tables representing microbial donors or highly expressed genes. Multiple tests are used to identify putatively alien genes, to predict putative donors and to mask putatively highly expressed genes. Thus, we determine the states and emission probabilities of an inhomogeneous hidden Markov model working on gene level. For the transition probabilities, we draw upon classical test theory with the intention of integrating a sensitivity controller in a consistent manner. SIGI-HMM was written in JAVA and is publicly available. It accepts as input any file created according to the EMBL-format.It generates output in the common GFF format readable for genome browsers. Benchmark tests showed that the output of SIGI-HMM is in agreement with known findings. Its predictions were both consistent with annotated GIs and with predictions generated by different methods. SIGI-HMM is a sensitive tool for the identification of GIs in microbial genomes. It allows to interactively analyze genomes in detail and to generate or to test hypotheses about the origin of acquired genes.Keywords
This publication has 56 references indexed in Scilit:
- Use of Artificial Genomes in Assessing Methods for Atypical Gene DetectionPLoS Computational Biology, 2005
- Biased biological functions of horizontally transferred genes in prokaryotic genomesNature Genetics, 2004
- G+C3 Structuring Along the Genome: A Common Feature in ProkaryotesMolecular Biology and Evolution, 2003
- Capturing Whole-Genome Characteristics in Short Sequences Using a Naïve Bayesian ClassifierGenome Research, 2001
- Computational Inference of Homologous Gene Structures in the Human GenomeGenome Research, 2001
- Pathogenicity Islands and the Evolution of MicrobesAnnual Review of Microbiology, 2000
- Database resources of the National Center for Biotechnology InformationNucleic Acids Research, 2000
- Detecting Alien Genes in Bacterial GenomesaAnnals of the New York Academy of Sciences, 1999
- Prediction of complete gene structures in human genomic DNAJournal of Molecular Biology, 1997
- Dissimilarity Analysis: a new Technique of Hierarchical Sub-divisionNature, 1964