Genomic diversity of Salmonella enterica -The UoWUCC 10K genomes project
Open Access
- 24 September 2020
- journal article
- Published by F1000 Research Ltd in Wellcome Open Research
Abstract
Background: Most publicly available genomes of Salmonella enterica are from human disease in the US and the UK, or from domesticated animals in the US. Methods: Here we describe a historical collection of 10,000 strains isolated between 1891-2010 in 73 different countries. They encompass a broad range of sources, ranging from rivers through reptiles to the diversity of all S. enterica isolated on the island of Ireland between 2000 and 2005. Genomic DNA was isolated, and sequenced by Illumina short read sequencing. Results: The short reads are publicly available in the Short Reads Archive. They were also uploaded to EnteroBase, which assembled and annotated draft genomes. 9769 draft genomes which passed quality control were genotyped with multiple levels of multilocus sequence typing, and used to predict serovars. Genomes were assigned to hierarchical clusters on the basis of numbers of pair-wise allelic differences in core genes, which were mapped to genetic Lineages within phylogenetic trees. Conclusions: The University of Warwick/University College Cork (UoWUCC) project greatly extends the geographic sources, dates and core genomic diversity of publicly available S. enterica genomes. We illustrate these features by an overview of core genomic Lineages within 33,000 publicly available Salmonella genomes whose strains were isolated before 2011. We also present detailed examinations of HC400, HC900 and HC2000 hierarchical clusters within exemplar Lineages, including serovars Typhimurium, Enteritidis and Mbandaka. These analyses confirm the polyphyletic nature of multiple serovars while showing that discrete clusters with geographical specificity can be reliably recognized by hierarchical clustering approaches. The results also demonstrate that the genomes sequenced here provide an important counterbalance to the sampling bias which is so dominant in current genomic sequencing.Keywords
Funding Information
- Health and Social Care Research and Development Division
- U.S. Department of Agriculture (6040-32000-009-00-D)
- Science Foundation of Ireland (05/FE1/B882)
- Wellcome Trust (202792)
This publication has 66 references indexed in Scilit:
- Embracing Diversity: Differences in Virulence Mechanisms, Disease Severity, and Host Adaptations Contribute to the Success of Nontyphoidal Salmonella as a Foodborne PathogenFrontiers in Microbiology, 2019
- A genomic overview of the population structure of SalmonellaPLoS Genetics, 2018
- Identification ofSalmonellafor public health surveillance using whole genome sequencingPeerJ, 2016
- The Murray collection of pre-antibiotic era Enterobacteriacae: a unique research resourceGenome Medicine, 2015
- Two Draft Genome Sequences of a New Serovar of Salmonella enterica, Serovar LubbockMicrobiology Resource Announcements, 2015
- Population structures in the SARA and SARB reference collections of Salmonella enterica according to MLST, MLEE and microarray hybridizationInfection, Genetics and Evolution, 2013
- Multilocus Sequence Typing as a Replacement for Serotyping in Salmonella entericaPLoS Pathogens, 2012
- Host Restriction of Salmonella enterica Serotype Typhimurium Pigeon Isolates Does Not Correlate with Loss of Discrete GenesJournal of Bacteriology, 2004
- Salmonella reference collection B (SARB): strains of 37 serovars of subspecies IJournal of General Microbiology, 1993
- Reference collection of strains of the Salmonella typhimurium complex from natural populationsJournal of General Microbiology, 1991