Skip to content Skip to navigation
University of Warwick
  • Study
  • |
  • Research
  • |
  • Business
  • |
  • Alumni
  • |
  • News
  • |
  • About

University of Warwick
Publications service & WRAP

Highlight your research

  • WRAP
    • Home
    • Search WRAP
    • Browse by Warwick Author
    • Browse WRAP by Year
    • Browse WRAP by Subject
    • Browse WRAP by Department
    • Browse WRAP by Funder
    • Browse Theses by Department
  • Publications Service
    • Home
    • Search Publications Service
    • Browse by Warwick Author
    • Browse Publications service by Year
    • Browse Publications service by Subject
    • Browse Publications service by Department
    • Browse Publications service by Funder
  • Statistics
  • Help & Advice
University of Warwick

The Library

  • Login

R/BHC: fast Bayesian hierarchical clustering for microarray data

Tools
- Tools
+ Tools

Savage, Richard S., Heller, K. (Katherine), Xu, Yang, Ghahramani, Zoubin, Truman, William M., Grant, M. (Murray), Denby, Katherine J. and Wild, David L.. (2009) R/BHC: fast Bayesian hierarchical clustering for microarray data. BMC Bioinformatics, Vol.10 . No.242. ISSN 1471-2105

[img]
Preview
PDF
WRAP_Savage_hr-151209-bmc_article_june09.pdf - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader

Download (227Kb)
Official URL: http://dx.doi.org/10.1186/1471-2105-10-242

Abstract

Background: Although the use of clustering methods has rapidly become one of the standard computational approaches in the literature of microarray gene expression data analysis, little attention has been paid to uncertainty in the results obtained. Results: We present an R/Bioconductor port of a fast novel algorithm for Bayesian agglomerative hierarchical clustering and demonstrate its use in clustering gene expression microarray data. The method performs bottom-up hierarchical clustering, using a Dirichlet Process (infinite mixture) to model uncertainty in the data and Bayesian model selection to decide at each step which clusters to merge. Conclusion: Biologically plausible results are presented from a well studied data set: expression profiles of A. thaliana subjected to a variety of biotic and abiotic stresses. Our method avoids several limitations of traditional methods, for example how many clusters there should be and how to choose a principled distance metric.

Item Type: Journal Article
Subjects: R Medicine > R Medicine (General)
Q Science > QA Mathematics
Divisions: Faculty of Science > Centre for Systems Biology
Faculty of Science > Life Sciences (2010- ) > Warwick HRI (2004-2010)
Library of Congress Subject Headings (LCSH): Bayesian statistical decision theory, Gene expression -- Statistical methods, Dirichlet series, Arabidopsis thaliana
Journal or Publication Title: BMC Bioinformatics
Publisher: BioMed Central Ltd.
ISSN: 1471-2105
Date: 6 August 2009
Volume: Vol.10
Page Range: No.242
Identification Number: 10.1186/1471-2105-10-242
Status: Peer Reviewed
Access rights to Published version: Open Access
Funder: Engineering and Physical Sciences Research Council (EPSRC), Biotechnology and Biological Sciences Research Council (Great Britain) (BBSRC), Marie Curie Fellowship Association (MCFA)
Grant number: EP/F027400/1 (EPSRC), BB/F005806/1 (BBSRC), 46444 (MCFA)
References: 1. Eisen M, Spellman P, Brown P, Botstein D: Cluster Analysis and Display of Genome-wide Expression. PNAS 1998, 95:14863{14868. 2. Alon U, Barkai N, Notterman D, Gish K, Ybarra S, Mack D, Levine A: Broad Patterns of Gene Expression Revealed by Clustering Analysis of Tumor and Normal Colon Tissues Probed by Oligonucleotide Arrays. Proc. Natl Acad. Sci 1999, 96:6745{6750. 3. McLachlan G, Bean R, Peel D: A mixture model-based approach to the clustering of microarray expression data. Bioinformatics 2002, 18(3):413{422. 4. Kerr M, Churchill G: Bootstrapping cluster analysis: assessing the reliability of conclusions from microarray experiments. Proceedings of the National Academy of Sciences 2001, 98(16):8961. 5. Zhang K, Zhao H: Assessing reliability of gene clusters from gene expression data. Funct. Integr. Genomics 2000, 1:156{173. 6. Hughes T, Marton M, Jones A, Roberts C, Stoughton R, Armour C, Bennett H, Coey E, Dai H, He Y, Kidd M, King A, Meyer M, Slade D, Lum P, Stepaniants S, Shoemaker D, Gachotte D, Chakraburtty K, Simon J, Bard M, Friend S: Functional Discovery via a Compendium of Expression Proles. Cell 2000, 102:109{126. 7. Levenstien M, Yang Y, Ott J: Statistical signicance for hierarchical clustering in genetic association and microarray expression studies. BMC bioinformatics 2003, 4:62. 8. Hartigan J: Clustering Algorithms. New York: Wiley 1975. 9. Yeung K, Haynor D, Ruzzo W: Validating clustering for gene expression data. Bioinformatics 2001, 17:309{318. 10. Mackay DJ: Information Theory, Inference and Learning Algorithms. Cambridge: Cambridge University Press 2003. 11. Bauwens L, Rombouts J: Bayesian clustering of many GARCH models. SSRN eLibrary 2003. 12. Fr�uhwirth-Schnatter S, Kaufmann S: Model-based clustering of multiple time series. Tech. rep., Johannes Kepler Universitat Linz 2005. [Working paper]. 13. Jackson E, Davy M, Doucet A, WJ F: Bayesian Unsupervised Classication by Dirichlet Process Mixtures of Gaussian Processes. IEEE ICASSP 2007, in press. 14. Beaumont M, Rannala B: The Bayesian revolution in genetics. Nat. Rev. Genet. 2004, 5(4):251{261. 15. Neal R: Density Modeling and Clustering Using Dirichlet Diusion Trees. In Bayesian Statistics, Volume 7. Edited by Bernardo J, Bayarri M, Berger J, Dawid A, Heckerman D, Smith A, West M 2003:619{629. 16. Heard N, Holmes C, Stephens D, Hand D, Dimopoulos G: Bayesian coclustering of Anopheles gene expression time series: Study of immune defense response to multiple experimental challenges. Proceedings of the National Academy of Sciences 2005, 102(47):16939{16944. 17. Heard N, Holmes C, Stephens D: A Quantitative Study of Gene Regulation Involved in the Immune Response of Anopheline Mosquitoes: An Application of Bayesian Hierarchical Clustering of Curves. JOURNAL-AMERICAN STATISTICAL ASSOCIATION 2006, 101(473):18. 18. Rasmussen C, de la Cruz B, Ghahramani Z, Wild DL: Modeling and Visualizing Uncertainty in Gene Expression Clusters using Dirichlet Process Mixtures. IEEE/ACM Transactions on Computational Biology and Bioinformatics 2007. [http://doi.ieeecomputersociety.org/10.1109/TCBB.2007.70269]. 19. Heller KA, Ghahramani Z: Bayesian Hierarchical Clustering. In Twenty-second International Conference on Machine Learning (ICML-2005) 2005. 20. Rasmussen CE: The Innite Gaussian Mixture Model. In Advances in Neural Information Processing Systems 12. Edited by Solla SA, Leen TK, M�uller KR, MIT Press 2000:554{560. 21. de Torres-Zabala M, Truman W, Bennett MH, Laorgue G, Manseld JW, Egea PR, B�oge L, Grant M: Pseudomonas syringae pv. tomato hijacks the Arabidopsis abscisic acid signalling pathway to cause disease. EMBO Journal 2007, 26:1434{1443. 22. Wu Z, Irizarry R, Gentleman R, Martinez-Murillo F, Spencer F: A model-based background adjustment for oligonucleotide expression arrays. Journal of the American Statistical Association 2004, 99(468):909{917. 23. Brock G, Pihur V, Datta S, Datta S: clValid, an R package for cluster validation. Journal of Statistical Software 2008, 25:1{22. 24. Rand W: Objective criteria for the evaluation of clustering methods. Journal of the American Statistical association 1971, :846{850. 25. Yeung K, Medvedovic M, Bumgarner R: Clustering gene-expression data with repeated measurements. Genome Biol 2003, 4(5):R34. 26. Ideker T, Thorsson V, Ranish J, Christmas R, Buhler J, Eng J, Bumgarner R, Goodlett D, Aebersold R, Hood L: Integrated genomic and proteomic analyses of a systematically perturbed metabolic network. Science 2001, 292(5518):929{934. 27. Yao J, Chang C, Salmi M, Hung Y, Loraine A, Roux S: Genome-scale cluster analysis of replicated microarrays using shrinkage correlation coecient. BMC bioinformatics 2008, 9:288. 28. Gerber G, Dowell R, Jaakkola T, Giord D, Sidow A: Automated discovery of functional generality of human gene expression programs. PLoS Comput Biol 2007, 3(8):e148. 29. Falcon S, Gentleman R: Using GOstats to test gene lists for GO term association. Bioinformatics 2007, 23(2):257. 30. Datta S, Datta S: Methods for evaluating clustering algorithms for gene expression data using a reference set of functional classes. BMC bioinformatics 2006, 7:397. 31. Jelenska J, Yao N, Vinatzer B, Wright C, Brodsky J, Greenberg J: AJ domain virulence eector of Pseudomonas syringae remodels host chloroplasts and suppresses defenses. Current Biology 2007, 17(6):499{508.
URI: http://wrap.warwick.ac.uk/id/eprint/2463

Request changes to a record

Actions (login required)

View Item View Item

Document Downloads

More statistics for this item...
twitter

Email us: publications@warwick.ac.uk
Contact Details
About Us