The Library
Seqenv : linking sequences to environments through text mining
Tools
Sinclair, Lucas, Ijaz, Umer Z., Jensen, Lars Juhl, Coolen, Marco J.L., Gubry-Rangin, Cecile, Chroňáková, Alica, Oulas, Anastasis, Pavloudi, Christina, Schnetzer, Julia, Weimann, Aaron, Ijaz, Ali, Eiler, Alexander, Quince, Christopher and Pafilis, Evangelos (2016) Seqenv : linking sequences to environments through text mining. PeerJ, 4 . e2690. doi:10.7717/peerj.2690 ISSN 2167-8359.
PDF
WRAP_peerj-2690.pdf - Published Version - Requires a PDF viewer. Available under License Creative Commons Attribution 4.0. Download (2841Kb) |
Official URL: http://dx.doi.org/10.7717/peerj.2690
Abstract
Understanding the distribution of taxa and associated traits across different environments is one of the central questions in microbial ecology. High-throughput sequencing (HTS) studies are presently generating huge volumes of data to address this biogeographical topic. However, these studies are often focused on specific environment types or processes leading to the production of individual, unconnected datasets. The large amounts of legacy sequence data with associated metadata that exist can be harnessed to better place the genetic information found in these surveys into a wider environmental context. Here we introduce a software program, seqenv, to carry out precisely such a task. It automatically performs similarity searches of short sequences against the ‘‘nt’’ nucleotide database provided by NCBI and, out of every hit, extracts–if it is available–the textual metadata field. After collecting all the isolation sources from all the search results, we run a text mining algorithm to identify and parse words that are associated with the Environmental Ontology (EnvO) controlled vocabulary. This, in turn, enables us to determine both in which environments individual sequences or taxa have previously been observed and, by weighted summation of those results, to summarize complete samples. We present two demonstrative applications of seqenv to a survey of ammonia oxidizing archaea as well as to a plankton paleome dataset from the Black Sea. These demonstrate the ability of the tool to reveal novel patterns in HTS How to cite this article Sinclair et al. (2016), Seqenv: linking sequences to environments through text mining. PeerJ 4:e2690; DOI 10.7717/peerj.2690 and its utility in the fields of environmental source tracking, paleontology, and studies of microbial biogeography.
Item Type: | Journal Article | ||||||||
---|---|---|---|---|---|---|---|---|---|
Subjects: | Q Science > QA Mathematics > QA76 Electronic computers. Computer science. Computer software Q Science > QR Microbiology |
||||||||
Divisions: | Faculty of Science, Engineering and Medicine > Medicine > Warwick Medical School > Biomedical Sciences Faculty of Science, Engineering and Medicine > Medicine > Warwick Medical School > Biomedical Sciences > Microbiology & Infection Faculty of Science, Engineering and Medicine > Medicine > Warwick Medical School |
||||||||
Library of Congress Subject Headings (LCSH): | Microbial ecology -- Computer programs, Bacteria -- Classification -- Databases, Algorithms, Archaebacteria, Plankton | ||||||||
Journal or Publication Title: | PeerJ | ||||||||
Publisher: | PeerJ, Ltd. | ||||||||
ISSN: | 2167-8359 | ||||||||
Official Date: | 20 December 2016 | ||||||||
Dates: |
|
||||||||
Volume: | 4 | ||||||||
Article Number: | e2690 | ||||||||
DOI: | 10.7717/peerj.2690 | ||||||||
Status: | Peer Reviewed | ||||||||
Publication Status: | Published | ||||||||
Access rights to Published version: | Open Access (Creative Commons) | ||||||||
Date of first compliant deposit: | 16 January 2017 | ||||||||
Date of first compliant Open Access: | 16 January 2017 | ||||||||
Funder: | Sweden. Stiftelsen för strategisk forskning [Foundation for strategic research] (SSF), Natural Environment Research Council (Great Britain) (NERC), Novo Nordisk Foundation, Seventh Framework Programme (European Commission) (FP7), Medical Research Council (Great Britain) (MRC) | ||||||||
Grant number: | ICA10-0015 (SSF), NE/L011956/1, NE/J019151/1 (NERC), NNF14CC0001 (Novo Nordisk Foundation), Grant #264089, 384676-94/GSRT/NSRF C&E (FP7), MR/L015080/1, MR/M50161X/1 (MRC) |
Request changes or add full text files to a record
Repository staff actions (login required)
View Item |
Downloads
Downloads per month over past year