Genomic insights that advance the species definition for prokaryotes

Abstract
To help advance the species definition for prokaryotes, we have compared the gene content of 70 closely related and fully sequenced bacterial genomes to identify whether species boundaries exist, and to determine the role of the organism9s ecology on its shared gene content. We found the average nucleotide identity (ANI) of the shared genes between two strains to be a robust means to compare genetic relatedness among strains, and that ANI values of ≈94% corresponded to the traditional 70% DNA–DNA reassociation standard of the current species definition. At the 94% ANI cutoff, current species includes only moderately homogeneous strains, e.g., most of the >4-Mb genomes share only 65–90% of their genes, apparently as a result of the strains having evolved in different ecological settings. Furthermore, diagnostic genetic signatures (boundaries) are evident between groups of strains of the same species, and the intergroup genetic similarity can be as high as 98–99% ANI, indicating that justifiable species might be found even among organisms that are nearly identical at the nucleotide level. Notably, a large fraction, e.g., up to 65%, of the differences in gene content within species is associated with bacteriophage and transposase elements, revealing an important role of these elements during bacterial speciation. Our findings are consistent with a definition for species that would include a more homogeneous set of strains than provided by the current definition and one that considers the ecology of the strains in addition to their evolutionary distance.