Comparison of the Escherichia coli K-12 genome with sampled genomes of a Klebsiella pneumoniae and three Salmonella enterica serovars, Typhimurium, Typhi and Paratyphi

Abstract
The Escherichia coli K-12 genome (ECO) was compared with the sampled genomes of the sibling species Salmonella enterica serovars Typhimurium, Typhi and Paratyphi A (collectively referred to as SAL) and the genome of the close outgroup Klebsiella pneumoniae (KPN). There are at least 160 locations where sequences of >400 bp are absent from ECO but present in the genomes of all three SAL and 394 locations where sequences are present in ECO but close homologs are absent in all SAL genomes. The 394 sequences in ECO that do not occur in SAL contain 1350 (30.6%) of the 4405 ECO genes. Of these, 1165 are missing from both SAL and KPN. Most of the 1165 genes are concentrated within 28 regions of 10–40 kb, which consist almost exclusively of such genes. Among these regions were six that included previously identified cryptic phage. A hypothetical ancestral state of genomic regions that differ between ECO and SAL can be inferred in some cases by reference to the genome structure in KPN and the more distant relative Yersinia pestis. However, many changes between ECO and SAL are concentrated in regions where all four genera have a different structure. The rate of gene insertion and deletion is sufficiently high in these regions that the ancestral state of the ECO/SAL lineage cannot be inferred from the present data. The sequencing of other closely related genomes, such as S.bongori or Citrobacter, may help in this regard.