The Library
Data for EnteroBase: Hierarchical clustering of 100,000 s of bacterial genomes into species/sub-species and populations
Tools
Achtman, M., Zhou, Zhemin, Charlesworth, Jane and Baxter, Laura (2022) Data for EnteroBase: Hierarchical clustering of 100,000 s of bacterial genomes into species/sub-species and populations. [Dataset]
Plain Text (Readme file)
WRAP_dataset_162247_README.txt - Published Version Available under License Creative Commons: Attribution-Noncommercial 4.0. Download (3444b) |
|
Plain Text (Metadata file)
WRAP_dataset_162247_metadata.txt - Published Version Available under License Creative Commons: Attribution-Noncommercial 4.0. Download (683b) |
|
Archive (ZIP) (Dataset)
WRAP_dataset_162247.zip - Published Version Available under License Creative Commons: Attribution-Noncommercial 4.0. Download (306Mb) |
Official URL: http://wrap.warwick.ac.uk/162247/
Abstract
The definition of bacterial species is traditionally a taxonomic issue while bacterial populations are identified by population genetics. These assignments are species specific, and depend on the practitioner. Legacy multilocus sequence typing is commonly used to identify sequence types (STs) and clusters (ST Complexes). However, these approaches are not adequate for the millions of genomic sequences from bacterial pathogens that have been generated since 2012. EnteroBase (http://enterobase.warwick.ac.uk) automatically clusters core genome MLST allelic profiles into hierarchical clusters (HierCC) after assembling annotated draft genomes from short read sequences. HierCC clusters span core sequence diversity from the species level down to individual transmission chains. Here we evaluate HierCC’s ability to correctly assign 100,000s of genomes to the species/subspecies and population levels for Salmonella, Escherichia, Clostridoides, Yersinia, Vibrio and Streptococcus. HierCC assignments were more consistent with maximum-likelihood super-trees of core SNPs or presence/absence of accessory genes than classical taxonomic assignments or 95% ANI. However, neither HierCC nor ANI were uniformly consistent with classical taxonomy of Streptococcus. HierCC was also consistent with legacy eBGs/ST Complexes in Salmonella or Escherichia and with O serogroups in Salmonella. Thus, EnteroBase HierCC supports the automated 36 identification of and assignment to species/subspecies and populations for multiple genera.
Item Type: | Dataset | ||||||||
---|---|---|---|---|---|---|---|---|---|
Subjects: | Q Science > QH Natural history Q Science > QR Microbiology |
||||||||
Divisions: | Faculty of Science, Engineering and Medicine > Medicine > Warwick Medical School | ||||||||
Type of Data: | Computational data | ||||||||
Library of Congress Subject Headings (LCSH): | Bacterial genomes, Salmonella -- Genetics, Escherichia -- Genetics, Yersinia -- Genetics, Vibrio -- Genetics, Streptococcus -- Genetics | ||||||||
Publisher: | Warwick Medical School | ||||||||
Official Date: | 4 April 2022 | ||||||||
Dates: |
|
||||||||
Status: | Not Peer Reviewed | ||||||||
Publication Status: | Published | ||||||||
Media of Output (format): | .gz, .fasta*, .presence*, .cgMLST, .xlsx, .pdf and .docx | ||||||||
Access rights to Published version: | Open Access (Creative Commons) | ||||||||
Copyright Holders: | University of Warwick | ||||||||
Description: | The full set of deposited files is listed below: Data Files: For details, refer the Dataset (Zip), Readme and Metadata files |
||||||||
Date of first compliant deposit: | 4 April 2022 | ||||||||
Date of first compliant Open Access: | 4 April 2022 | ||||||||
RIOXX Funder/Project Grant: |
|
||||||||
Related URLs: | |||||||||
Contributors: |
|
Request changes or add full text files to a record
Repository staff actions (login required)
View Item |
Downloads
Downloads per month over past year