Resolving the structural features of genomic islands: A machine learning approach
- 10 December 2007
- journal article
- research article
- Published by Cold Spring Harbor Laboratory in Genome Research
- Vol. 18 (2), 331-342
- https://doi.org/10.1101/gr.7004508
Abstract
Large inserts of horizontally acquired DNA that contain functionally related genes with limited phylogenetic distribution are often referred to as genomic islands (GIs), and structural definitions of these islands, based on common features, have been proposed. Although a large number of mobile elements fall well within the GI definition, there are several concerns about the structural consensus for GIs: The current GI definition was put forward 10 yr ago when only 12 complete bacterial genomes were available, a large number of GIs deviate from that definition, and in silico predictions assuming a full/partial GI structural model bias the sampling of the GI structural space toward “well-structured” GIs. In this study, the structural features of genomic regions are sampled by a hypothesis-free, bottom-up search, and these are exploited in a machine learning approach with the aim of explicitly quantifying and modeling the contribution of each feature to the GI structure. Performing a whole-genome-based comparative analysis between 37 strains of three different genera and 12 outgroup genomes, 668 genomic regions were sampled and used to train structural GI models. The data show that, overall, GIs from the three different genera fall into distinct, genus-specific structural families. However, decreasing the taxa resolution, by studying GI structures across different genus boundaries, provides models that converge on a fairly similar GI structure, further suggesting that GIs can be seen as a superfamily of mobile elements, with core and variable structural features, rather than a well-defined family.Keywords
This publication has 61 references indexed in Scilit:
- Molecular Correlates of Host Specialization in Staphylococcus aureusPLOS ONE, 2007
- The SPI-2 type III secretion system restricts motility of Salmonella-containing vacuolesCellular Microbiology, 2007
- Genetic flux over time in the Salmonella lineageGenome Biology, 2007
- Molecular genetic anatomy of inter- and intraserotype variation in the human bacterial pathogen group A StreptococcusProceedings of the National Academy of Sciences, 2006
- Whole-Genome Sequencing of Staphylococcus haemolyticus Uncovers the Extreme Plasticity of Its Genome and the Evolution of Human-Colonizing Staphylococcal SpeciesJournal of Bacteriology, 2005
- Cloning and Sequencing of a Genomic Island Found in the Brazilian Purpuric Fever Clone of Haemophilus influenzae Biogroup AegyptiusInfection and Immunity, 2005
- Complete sequence and comparative genome analysis of the dairy bacterium Streptococcus thermophilusNature Biotechnology, 2004
- Caloramator viterbensis sp. nov., a novel thermophilic, glycerol-fermenting bacterium isolated from a hot spring in ItalyInternational Journal of Systematic and Evolutionary Microbiology, 2002
- The Complete Genome Sequence of Escherichia coli K-12Science, 1997
- Gapped BLAST and PSI-BLAST: a new generation of protein database search programsNucleic Acids Research, 1997