The Library
Calculating orthologs in bacteria and archaea : a divide and conquer approach
Tools
Halachev, Mihail R., Loman, Nicholas J. and Pallen, Mark J. (2011) Calculating orthologs in bacteria and archaea : a divide and conquer approach. PLoS One, Volume 6 (Number 12). Article number e28388. doi:10.1371/journal.pone.0028388 ISSN 1932-6203.
|
Text
WRAP_journal.pone.0028388.pdf - Published Version Available under License Creative Commons Attribution. Download (1510Kb) | Preview |
Official URL: http://dx.doi.org/10.1371/journal.pone.0028388
Abstract
Among proteins, orthologs are defined as those that are derived by vertical descent from a single progenitor in the last common ancestor of their host organisms. Our goal is to compute a complete set of protein orthologs derived from all currently available complete bacterial and archaeal genomes. Traditional approaches typically rely on all-against-all BLAST searching which is prohibitively expensive in terms of hardware requirements or computational time (requiring an estimated 18 months or more on a typical server). Here, we present xBASE-Orth, a system for ongoing ortholog annotation, which applies a “divide and conquer” approach and adopts a pragmatic scheme that trades accuracy for speed. Starting at species level, xBASE-Orth carefully constructs and uses pan-genomes as proxies for the full collections of coding sequences at each level as it progressively climbs the taxonomic tree using the previously computed data. This leads to a significant decrease in the number of alignments that need to be performed, which translates into faster computation, making ortholog computation possible on a global scale. Using xBASE-Orth, we analyzed an NCBI collection of 1,288 bacterial and 94 archaeal complete genomes with more than 4 million coding sequences in 5 weeks and predicted more than 700 million ortholog pairs, clustered in 175,531 orthologous groups. We have also identified sets of highly conserved bacterial and archaeal orthologs and in so doing have highlighted anomalies in genome annotation and in the proposed composition of the minimal bacterial genome. In summary, our approach allows for scalable and efficient computation of the bacterial and archaeal ortholog annotations. In addition, due to its hierarchical nature, it is suitable for incorporating novel complete genomes and alternative genome annotations. The computed ortholog data and a continuously evolving set of applications based on it are integrated in the xBASE database, available at http://www.xbase.ac.uk/.
Item Type: | Journal Article | ||||
---|---|---|---|---|---|
Subjects: | Q Science > QH Natural history > QH301 Biology Q Science > QH Natural history > QH426 Genetics Q Science > QR Microbiology |
||||
Divisions: | Faculty of Science, Engineering and Medicine > Medicine > Warwick Medical School > Biomedical Sciences > Microbiology & Infection Faculty of Science, Engineering and Medicine > Medicine > Warwick Medical School |
||||
Library of Congress Subject Headings (LCSH): | Bacterial genomes, Archaebacteria -- Research, Proteins | ||||
Journal or Publication Title: | PLoS One | ||||
Publisher: | Public Library of Science | ||||
ISSN: | 1932-6203 | ||||
Official Date: | 2011 | ||||
Dates: |
|
||||
Volume: | Volume 6 | ||||
Number: | Number 12 | ||||
Page Range: | Article number e28388 | ||||
DOI: | 10.1371/journal.pone.0028388 | ||||
Status: | Peer Reviewed | ||||
Publication Status: | Published | ||||
Access rights to Published version: | Open Access (Creative Commons) | ||||
Date of first compliant deposit: | 26 December 2015 | ||||
Date of first compliant Open Access: | 26 December 2015 | ||||
Funder: | Biotechnology and Biological Sciences Research Council (Great Britain) (BBSRC) | ||||
Grant number: | BBE0111791 (BBSRC) |
Request changes or add full text files to a record
Repository staff actions (login required)
View Item |
Downloads
Downloads per month over past year