Resolving the structural features of genomic islands: A machine learning approach

10 December 2007

journal article
research article
Published by Cold Spring Harbor Laboratory in Genome Research

Vol. 18 (2), 331-342
https://doi.org/10.1101/gr.7004508

Abstract

Large inserts of horizontally acquired DNA that contain functionally related genes with limited phylogenetic distribution are often referred to as genomic islands (GIs), and structural definitions of these islands, based on common features, have been proposed. Although a large number of mobile elements fall well within the GI definition, there are several concerns about the structural consensus for GIs: The current GI definition was put forward 10 yr ago when only 12 complete bacterial genomes were available, a large number of GIs deviate from that definition, and in silico predictions assuming a full/partial GI structural model bias the sampling of the GI structural space toward “well-structured” GIs. In this study, the structural features of genomic regions are sampled by a hypothesis-free, bottom-up search, and these are exploited in a machine learning approach with the aim of explicitly quantifying and modeling the contribution of each feature to the GI structure. Performing a whole-genome-based comparative analysis between 37 strains of three different genera and 12 outgroup genomes, 668 genomic regions were sampled and used to train structural GI models. The data show that, overall, GIs from the three different genera fall into distinct, genus-specific structural families. However, decreasing the taxa resolution, by studying GI structures across different genus boundaries, provides models that converge on a fairly similar GI structure, further suggesting that GIs can be seen as a superfamily of mobile elements, with core and variable structural features, rather than a well-defined family.

Keywords

This publication has 61 references indexed in Scilit:

Molecular Correlates of Host Specialization in Staphylococcus aureus
PLOS ONE, 2007
The SPI-2 type III secretion system restricts motility of Salmonella-containing vacuoles
Cellular Microbiology, 2007
Genetic flux over time in the Salmonella lineage
Genome Biology, 2007
Molecular genetic anatomy of inter- and intraserotype variation in the human bacterial pathogen group A Streptococcus
Proceedings of the National Academy of Sciences, 2006
Whole-Genome Sequencing of Staphylococcus haemolyticus Uncovers the Extreme Plasticity of Its Genome and the Evolution of Human-Colonizing Staphylococcal Species
Journal of Bacteriology, 2005
Cloning and Sequencing of a Genomic Island Found in the Brazilian Purpuric Fever Clone of Haemophilus influenzae Biogroup Aegyptius
Infection and Immunity, 2005
Complete sequence and comparative genome analysis of the dairy bacterium Streptococcus thermophilus
Nature Biotechnology, 2004
Caloramator viterbensis sp. nov., a novel thermophilic, glycerol-fermenting bacterium isolated from a hot spring in Italy
International Journal of Systematic and Evolutionary Microbiology, 2002
The Complete Genome Sequence of Escherichia coli K-12
Science, 1997
Gapped BLAST and PSI-BLAST: a new generation of protein database search programs
Nucleic Acids Research, 1997

Cited by 59 articles