The Library
Bayesian correlated clustering to integrate multiple datasets
Tools
Kirk, Paul, Griffin, Jim E., Savage, Richard S., Ghahramani, Zoubin and Wild, David L. (2012) Bayesian correlated clustering to integrate multiple datasets. Bioinformatics, Volume 28 (Number 4). pp. 3290-3297. doi:10.1093/bioinformatics/bts595 ISSN 1367-4803.
|
Text
WRAP_Kirk_Bioinformatics-2012-Kirk-bioinformatics_bts595.pdf - Published Version Download (616Kb) | Preview |
Official URL: http://dx.doi.org/10.1093/bioinformatics/bts595
Abstract
Motivation: The integration of multiple datasets remains a key challenge in systems biology and genomic medicine. Modern high-throughput technologies generate a broad array of different data types, providing distinct – but often complementary – information. We present a Bayesian method for the unsupervised integrative modelling of multiple datasets, which we refer to as MDI (Multiple Dataset Integration). MDI can integrate information from a wide range of different datasets and data types simultaneously (including the ability to model time series data explicitly using Gaussian processes). Each dataset is modelled using a Dirichlet-multinomial allocation (DMA) mixture model, with dependencies between these models captured via parameters that describe the agreement among the datasets.
Results: Using a set of 6 artificially constructed time series datasets, we show that MDI is able to integrate a significant number of datasets simultaneously, and that it successfully captures the underlying structural similarity between the datasets. We also analyse a variety of real S. cerevisiae datasets. In the 2-dataset case, we show that MDI’s performance is comparable to the present state of the art. We then move beyond the capabilities of current approaches and integrate gene expression, ChIP-chip and protein-protein interaction data, to identify a set of protein complexes for which genes are co-regulated during the cell cycle. Comparisons to other unsupervised data integration techniques – as well as to non-integrative approaches – demonstrate that MDI is very competitive, while also providing information that would be difficult or impossible to extract using other methods.
Item Type: | Journal Article | ||||
---|---|---|---|---|---|
Subjects: | Q Science > QA Mathematics Q Science > QH Natural history > QH301 Biology |
||||
Divisions: | Faculty of Science, Engineering and Medicine > Research Centres > Warwick Systems Biology Centre | ||||
Library of Congress Subject Headings (LCSH): | Biology -- Data processing, Data integration (Computer science), Cluster analysis | ||||
Journal or Publication Title: | Bioinformatics | ||||
Publisher: | Oxford University Press | ||||
ISSN: | 1367-4803 | ||||
Official Date: | 2012 | ||||
Dates: |
|
||||
Volume: | Volume 28 | ||||
Number: | Number 4 | ||||
Page Range: | pp. 3290-3297 | ||||
DOI: | 10.1093/bioinformatics/bts595 | ||||
Status: | Peer Reviewed | ||||
Publication Status: | Published | ||||
Access rights to Published version: | Open Access (Creative Commons) | ||||
Date of first compliant deposit: | 23 December 2015 | ||||
Date of first compliant Open Access: | 23 December 2015 | ||||
Funder: | Engineering and Physical Sciences Research Council (EPSRC), Medical Research Council (Great Britain) (MRC) | ||||
Grant number: | EP/I036575/1 (EPSRC) |
Request changes or add full text files to a record
Repository staff actions (login required)
View Item |
Downloads
Downloads per month over past year