The Library
Bayesian hierarchical clustering for microarray time series data with replicates and outlier measurements
Tools
Cooke, Emma J., Savage, Richard S., Kirk, Paul, Darkins, Robert and Wild, David L. (2011) Bayesian hierarchical clustering for microarray time series data with replicates and outlier measurements. BMC Bioinformatics, Vol.12 (No.1). p. 399. doi:10.1186/1471-2105-12-399 ISSN 1471-2105.
|
PDF
WRAP_Wild_1471-2105-12-399.pdf - Published Version - Requires a PDF viewer. Download (884Kb) |
Official URL: http://dx.doi.org/10.1186/1471-2105-12-399
Abstract
Background
Post-genomic molecular biology has resulted in an explosion of data, providing measurements for large numbers of genes, proteins and metabolites. Time series experiments have become increasingly common, necessitating the development of novel analysis tools that capture the resulting data structure. Outlier measurements at one or more time points present a significant challenge, while potentially valuable replicate information is often ignored by existing techniques.
Results
We present a generative model-based Bayesian hierarchical clustering algorithm for microarray time series that employs Gaussian process regression to capture the structure of the data. By using a mixture model likelihood, our method permits a small proportion of the data to be modelled as outlier measurements, and adopts an empirical Bayes approach which uses replicate observations to inform a prior distribution of the noise variance. The method automatically learns the optimum number of clusters and can incorporate non-uniformly sampled time points. Using a wide variety of experimental data sets, we show that our algorithm consistently yields higher quality and more biologically meaningful clusters than current state-of-the-art methodologies. We highlight the importance of modelling outlier values by demonstrating that noisy genes can be grouped with other genes of similar biological function. We demonstrate the importance of including replicate information, which we find enables the discrimination of additional distinct expression profiles.
Conclusions
By incorporating outlier measurements and replicate values, this clustering algorithm for time series microarray data provides a step towards a better treatment of the noise inherent in measurements from high-throughput genomic technologies. Timeseries BHC is available as part of the R package 'BHC' (version 1.5), which is available for download from Bioconductor (version 2.9 and above) via http://www.bioconductor.org/packages/release/bioc/html/BHC.html?pagewanted=all.
Item Type: | Journal Article | ||||
---|---|---|---|---|---|
Subjects: | Q Science > QH Natural history | ||||
Divisions: | Faculty of Science, Engineering and Medicine > Science > Chemistry Faculty of Science, Engineering and Medicine > Research Centres > Warwick Systems Biology Centre |
||||
Library of Congress Subject Headings (LCSH): | Molecular biology -- Data processing, Time-series analysis, Genes -- Mathematical models | ||||
Journal or Publication Title: | BMC Bioinformatics | ||||
Publisher: | Bio Med Central | ||||
ISSN: | 1471-2105 | ||||
Official Date: | 13 October 2011 | ||||
Dates: |
|
||||
Volume: | Vol.12 | ||||
Number: | No.1 | ||||
Page Range: | p. 399 | ||||
DOI: | 10.1186/1471-2105-12-399 | ||||
Status: | Peer Reviewed | ||||
Publication Status: | Published | ||||
Access rights to Published version: | Open Access (Creative Commons) | ||||
Date of first compliant deposit: | 19 December 2015 | ||||
Date of first compliant Open Access: | 19 December 2015 | ||||
Funder: | Engineering and Physical Sciences Research Council (EPSRC), Medical Research Council (Great Britain) (MRC), University of Warwick. Molecular Organisation and Assembly in Cells | ||||
Grant number: | EP/F027400/1 (EPSRC) |
Request changes or add full text files to a record
Repository staff actions (login required)
View Item |
Downloads
Downloads per month over past year