Skip to content Skip to navigation
University of Warwick
  • Study
  • |
  • Research
  • |
  • Business
  • |
  • Alumni
  • |
  • News
  • |
  • About

University of Warwick
Publications service & WRAP

Highlight your research

  • WRAP
    • Home
    • Search WRAP
    • Browse by Warwick Author
    • Browse WRAP by Year
    • Browse WRAP by Subject
    • Browse WRAP by Department
    • Browse WRAP by Funder
    • Browse Theses by Department
  • Publications Service
    • Home
    • Search Publications Service
    • Browse by Warwick Author
    • Browse Publications service by Year
    • Browse Publications service by Subject
    • Browse Publications service by Department
    • Browse Publications service by Funder
  • Help & Advice
University of Warwick

The Library

  • Login
  • Admin

Data for EnteroBase: Hierarchical clustering of 100,000 s of bacterial genomes into species/sub-species and populations

Tools
- Tools
+ Tools

Achtman, M., Zhou, Zhemin, Charlesworth, Jane and Baxter, Laura (2022) Data for EnteroBase: Hierarchical clustering of 100,000 s of bacterial genomes into species/sub-species and populations. [Dataset]

[img] Plain Text (Readme file)
WRAP_dataset_162247_README.txt - Published Version
Available under License Creative Commons: Attribution-Noncommercial 4.0.

Download (3444b)
[img] Plain Text (Metadata file)
WRAP_dataset_162247_metadata.txt - Published Version
Available under License Creative Commons: Attribution-Noncommercial 4.0.

Download (683b)
[img] Archive (ZIP) (Dataset)
WRAP_dataset_162247.zip - Published Version
Available under License Creative Commons: Attribution-Noncommercial 4.0.

Download (306Mb)

Request Changes to record.

Abstract

The definition of bacterial species is traditionally a taxonomic issue while bacterial populations are identified by population genetics. These assignments are species specific, and depend on the practitioner. Legacy multilocus sequence typing is commonly used to identify sequence types (STs) and clusters (ST Complexes). However, these approaches are not adequate for the millions of genomic sequences from bacterial pathogens that have been generated since 2012. EnteroBase (http://enterobase.warwick.ac.uk) automatically clusters core genome MLST allelic profiles into hierarchical clusters (HierCC) after assembling annotated draft genomes from short read sequences. HierCC clusters span core sequence diversity from the species level down to individual transmission chains. Here we evaluate HierCC’s ability to correctly assign 100,000s of genomes to the species/subspecies and population levels for Salmonella, Escherichia, Clostridoides, Yersinia, Vibrio and Streptococcus. HierCC assignments were more consistent with maximum-likelihood super-trees of core SNPs or presence/absence of accessory genes than classical taxonomic assignments or 95% ANI. However, neither HierCC nor ANI were uniformly consistent with classical taxonomy of Streptococcus. HierCC was also consistent with legacy eBGs/ST Complexes in Salmonella or Escherichia and with O serogroups in Salmonella. Thus, EnteroBase HierCC supports the automated 36 identification of and assignment to species/subspecies and populations for multiple genera.

Item Type: Dataset
Subjects: Q Science > QH Natural history
Q Science > QR Microbiology
Divisions: Faculty of Science, Engineering and Medicine > Medicine > Warwick Medical School
Type of Data: Computational data
Library of Congress Subject Headings (LCSH): Bacterial genomes, Salmonella -- Genetics, Escherichia -- Genetics, Yersinia -- Genetics, Vibrio -- Genetics, Streptococcus -- Genetics
Publisher: Warwick Medical School, University of Warwick
Official Date: 4 April 2022
Dates:
DateEvent
26 January 2022Created
4 April 2022Available
4 April 2022Published
Status: Not Peer Reviewed
Publication Status: Published
Media of Output: .gz, .fasta*, .presence*, .cgMLST, .xlsx, .pdf and .docx
Access rights to Published version: Open Access
Copyright Holders: University of Warwick
Description:

The full set of deposited files is listed below:

Data Files:
Cdiff.2021.HC5.cgMLST
Cdiff.2021.HC5.cgMLST.alleles.fasta.gz
Cdiff.2021.HC5.wgMLST_detail.presence.gz
EscherichiaReps.cgMLST.alleles.fasta.gz
EscherichiaReps.cgMLST
EscherichiaReps.wgMLST_detail.presence.gz
Salmonella.2021.HC5_lg4_OR_HC400.cgMLST
Salmonella.2021.HC5_lg4_OR_HC400.cgMLST.alleles.fasta.gz
Salmonella.2021.HC5_lg4_OR_HC400.wgMLST_detail.presence.gz
Streptococcus.2021.HC5_lg4_OR_HC50.cgMLST
Streptococcus.2021.HC5_lg4_OR_HC50.cgMLST.alleles.fasta.gz
Streptococcus.2021.HC5_lg4_OR_HC50.wgMLST_detail.presence.gz
Vibrio.2021.HC5.cgMLST
Vibrio.2021.HC5.cgMLST.alleles.fasta.gz
Vibrio.2021.HC5.wgMLST_detail.presence.gz
Yersinia.2021.HC5.cgMLST
Yersinia.2021.HC5.cgMLST.alleles.fasta.gz
Yersinia.2021.HC5.wgMLST_detail.presence.gz

For details, refer the Dataset (Zip), Readme and Metadata files

RIOXX Funder/Project Grant:
Project/Grant IDRIOXX Funder NameFunder ID
202792/Z/16/ZWellcome Trusthttp://dx.doi.org/10.13039/100010269
Related URLs:
  • Other
  • Related item in WRAP
Contributors:
ContributionNameContributor ID
DepositorBaxter, Laura2888

Request changes or add full text files to a record

Repository staff actions (login required)

View Item View Item

Downloads

Downloads per month over past year

View more statistics

twitter

Email us: wrap@warwick.ac.uk
Contact Details
About Us