README for WRAP dataset 162247, http://wrap.warwick.ac.uk/162247/ Manuscript Title: EnteroBase: Hierarchical clustering of 100,000s of bacterial genomes into species/sub-species and populations Authors: Achtman, Mark; University of Warwick Warwick Medical School Zhou, Zhemin; University of Warwick, Warwick Medical School Charlesworth, Jane; University of Warwick Warwick Medical School Baxter, Laura; University of Warwick, Bioinformatics Research Technology Platform Accepted for publication by Phil Trans Roy Soc B, March 2022 The accepted manuscript is deposited on WRAP: http://wrap.warwick.ac.uk/164101/ The full set of deposited files is listed below: Data Files: Cdiff.2021.HC5.cgMLST Cdiff.2021.HC5.cgMLST.alleles.fasta.gz Cdiff.2021.HC5.wgMLST_detail.presence.gz EscherichiaReps.cgMLST.alleles.fasta.gz EscherichiaReps.cgMLST EscherichiaReps.wgMLST_detail.presence.gz Salmonella.2021.HC5_lg4_OR_HC400.cgMLST Salmonella.2021.HC5_lg4_OR_HC400.cgMLST.alleles.fasta.gz Salmonella.2021.HC5_lg4_OR_HC400.wgMLST_detail.presence.gz Streptococcus.2021.HC5_lg4_OR_HC50.cgMLST Streptococcus.2021.HC5_lg4_OR_HC50.cgMLST.alleles.fasta.gz Streptococcus.2021.HC5_lg4_OR_HC50.wgMLST_detail.presence.gz Vibrio.2021.HC5.cgMLST Vibrio.2021.HC5.cgMLST.alleles.fasta.gz Vibrio.2021.HC5.wgMLST_detail.presence.gz Yersinia.2021.HC5.cgMLST Yersinia.2021.HC5.cgMLST.alleles.fasta.gz Yersinia.2021.HC5.wgMLST_detail.presence.gz Trees were computed using the data files, as according to materials and methods: *.alleles.fasta.gz files contain the nucleotide sequences for all alleles in the cgMLST scheme for that genus *.cgMLST is the matrix of cgMLST allelic profiles used to build the Maximum Likelihood super-tree from SNPs among core genes rows = Strain ID columns = Gene ID values = allelic number *.wgMLST_detail.presence.gz is the matrix used to build the Maximum Likelihood super-tree from wgMLST sequence presence/absence rows = Strain ID columns = Gene ID values = 1/0 denotes presence/absence The cgMLST trees are shown in the following figures: Fig 1, Salmonella https://enterobase.warwick.ac.uk/ms_tree?tree_id=53257 Fig 2, Escherichia https://enterobase.warwick.ac.uk/ms_tree?tree_id=52101 Fig 3, Clostridioides https://enterobase.warwick.ac.uk/ms_tree?tree_id=53253 Fig S1, Yersinia https://enterobase.warwick.ac.uk/ms_tree?tree_id=53269 Fig S2, Vibrio https://enterobase.warwick.ac.uk/ms_tree?tree_id=53265 Fig S3, Streptococcus https://enterobase.warwick.ac.uk/ms_tree?tree_id=53261 The wgMLST presence/absence trees are linked in the figure legends as follows: Fig 1, Salmonella: https://enterobase.warwick.ac.uk/ms_tree?tree_id=53258 Fig 2, Escherichia: https://enterobase.warwick.ac.uk/ms_tree?tree_id=71125 Fig 3, Clostridioides https://enterobase.warwick.ac.uk/ms_tree?tree_id=53254 Fig S1, Yersinia https://enterobase.warwick.ac.uk/ms_tree?tree_id=53270 Fig S2, Vibrio https://enterobase.warwick.ac.uk/ms_tree?tree_id=53266 Fig S3, Streptococcus https://enterobase.warwick.ac.uk/ms_tree?tree_id=53262 Supplementary Materials which accompany the accepted manuscript are also included: SupplementalMaterialOverview.docx SupplementalText.pdf Table_S1.xlsx Table_S2.xlsx Table_S3.xlsx Table_S4.pdf Table_S5.xlsx Table_S6.xlsx Table_S7.xlsx Table_S8.xlsx Table_S9.xlsx Table_S10.xlsx Fig_S1.pdf Fig_S5.pdf Fig_S4.pdf Fig_S3.pdf Fig_S2.pdf