Skip to content Skip to navigation
University of Warwick
  • Study
  • |
  • Research
  • |
  • Business
  • |
  • Alumni
  • |
  • News
  • |
  • About

University of Warwick
Publications service & WRAP

Highlight your research

  • WRAP
    • Home
    • Search WRAP
    • Browse by Warwick Author
    • Browse WRAP by Year
    • Browse WRAP by Subject
    • Browse WRAP by Department
    • Browse WRAP by Funder
    • Browse Theses by Department
  • Publications Service
    • Home
    • Search Publications Service
    • Browse by Warwick Author
    • Browse Publications service by Year
    • Browse Publications service by Subject
    • Browse Publications service by Department
    • Browse Publications service by Funder
  • Statistics
  • Help & Advice
University of Warwick

The Library

  • Login

Modeling and visualizing uncertainty in gene expression clusters using Dirichlet process mixtures

Tools
- Tools
+ Tools

Rasmussen, Carl Edward, De la Cruz, Bernard J., Ghahramani, Zoubin and Wild, David L.. (2009) Modeling and visualizing uncertainty in gene expression clusters using Dirichlet process mixtures. IEEE - ACM Transactions on Computational Biology and Bioinformatics, Vol.6 (No.4). pp. 615-628. ISSN 1545-5963

[img] PDF
WRAP_Wild_Gene_expression.pdf - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader

Download (4Mb)
Official URL: http://dx.doi.org/10.1109/TCBB.2007.70269

Abstract

Although the use of clustering methods has rapidly become one of the standard computational approaches in the literature of microarray gene expression data, little attention has been paid to uncertainty in the results obtained. Dirichlet process mixture (DPM) models provide a nonparametric Bayesian alternative to the bootstrap approach to modeling uncertainty in gene expression clustering. Most previously published applications of Bayesian model-based clustering methods have been to short time series data. In this paper, we present a case study of the application of nonparametric Bayesian clustering methods to the clustering of high-dimensional nontime series gene expression data using full Gaussian covariances. We use the probability that two genes belong to the same cluster in a DPM model as a measure of the similarity of these gene expression profiles. Conversely, this probability can be used to define a dissimilarity measure, which, for the purposes of visualization, can be input to one of the standard linkage algorithms used for hierarchical clustering. Biologically plausible results are obtained from the Rosetta compendium of expression profiles which extend previously published cluster analyses of this data.

Item Type: Journal Article
Subjects: Q Science > QA Mathematics
Q Science > QH Natural history > QH426 Genetics
Divisions: Faculty of Science > Centre for Systems Biology
Library of Congress Subject Headings (LCSH): Bioinformatics, Gaussian distribution, Stochastic processes, Statistics -- Data processing, Monte Carlo method, Probability measures, Bayesian statistical decision theory
Journal or Publication Title: IEEE - ACM Transactions on Computational Biology and Bioinformatics
Publisher: IEEE
ISSN: 1545-5963
Date: October 2009
Volume: Vol.6
Number: No.4
Page Range: pp. 615-628
Identification Number: 10.1109/TCBB.2007.70269
Status: Peer Reviewed
Access rights to Published version: Open Access
References: [1] M. Eisen, P. Spellman, P. Brown, and D. Botstein, “Cluster Analysis and Display of Genome-Wide Expression,” Proc. Nat’l Academy of Sciences USA, vol. 95, pp. 14863-14868, 1998. [2] U. Alon, N. Barkai, D. Notterman, K. Gish, S. Ybarra, D. Mack, and A. Levine, “Broad Patterns of Gene Expression Revealed by Clustering Analysis of Tumor and Normal Colon Tissues Probed by Oligonucleotide Arrays,” Proc. Nat’l Academy of Sciences USA, vol. 96, pp. 6745-6750, 1999. [3] G. McLachlan, R. Bean, and D. Peel, “A Mixture Model-Based Approach to the Clustering of Microarray Expression Data,” Bioinformatics, vol. 18, no. 3, pp. 413-422, 2002. [4] T. Hughes, M. Marton, A. Jones, C. Roberts, R. Stoughton, C. Armour, H. Bennett, E. Coffey, H. Dai, Y. He, M. Kidd, A. King, M. Meyer, D. Slade, P. Lum, S. Stepaniants, D. Shoemaker, D. Gachotte, K. Chakraburtty, J. Simon, M. Bard, and S. Friend, “Functional Discovery via a Compendium of Expression Profiles,” Cell, vol. 102, pp. 109-126, July 2000. [5] R.M. Neal, “Markov Chain Sampling Methods for Dirichlet Process Mixture Models,” J. Computational and Graphical Statistics, vol. 9, pp. 249-265, 2000. [6] C.E. Rasmussen, “The Infinite Gaussian Mixture Model,” Advances in Neural Information Processing Systems 12, S.A. Solla, T.K. Leen, and K.-R. Mu¨ ller, eds., pp. 554-560, MIT Press, 2000. [7] C. Antoniak, “Mixtures of Dirichlet Processes with Applications to Bayesian Nonparametric Problems,” Annals of Statistics, vol. 2, pp. 1152-1174, 1974. [8] T. Ferguson, “A Bayesian Analysis of Some Nonparametric Problems,” Annals of Statistics, vol. 1, pp. 209-230, 1973. [9] A.Y. Lo, “On a Class of Bayesian Nonparametric Estimates: I. Density Estimates,” Annals of Statistics, vol. 12, pp. 351-357, 1984. [10] M.D. Escobar and M. West, “Bayesian Density Estimation and Inference Using Mixtures,” J. Am. Statistical Assoc., vol. 90, no. 430, pp. 577-588, 1995. [11] D.L. Wild, C.E. Rasmussen, Z. Ghahramani, J. Cregg, B.J. de la Cruz, C.-C. Kan, and K.A. Scanlon, “A Bayesian Approach to Modelling Uncertainty in Gene Expression Clusters,” Proc. Third Int’l Conf. Systems Biology (ICSB), 2002. [12] M. Medvedovic and S. Sivaganesan, “Bayesian Infinite Mixture Model Based Clustering of Gene Expression Profiles,” Bioinformatics, vol. 18, no. 9, pp. 1194-1206, 2002. [13] M. Medvedovic, K.Y. Yeung, and R.E. Bumgarner, “Bayesian Mixture Model Based Clustering of Replicated Microarray Data,” Bioinformatics, vol. 20, no. 8, pp. 1222-1232, 2004. [14] X. Liu, S. Sivaganesan, K.Y. Yeung, J. Guo, R.E. Bumgarner, and M. Medvedovic, “Context-Specific Infinite Mixtures for Clustering Gene Expression Profiles across Diverse Microarray Dataset,” Bioinformatics, vol. 22, no. 14, pp. 1737-1744, 2006. [15] D. Dahl, “Model-Based Clustering for Expression Data via a Dirichlet Process Mixture Model,” Bayesian Inference for Gene Expression and Proteomics, K.-A. Do, P. Mu¨ ller, and M. Vannucci, eds., Cambridge Univ. Press, 2006. [16] Z.S. Qin, “Clustering Microarray Gene Expression Data Using Weighted Chinese Restaurant Process,” Bioinformatics, vol. 22, no. 16, pp. 1988-1997, 2006. [17] A. Dubey, S. Hwang, C. Rangel, C. Rasmussen, Z. Ghahramani, and D.L. Wild, “Clustering Protein Sequence and Structure Space with Infinite Gaussian Mixture Models,” Proc. Pacific Symp. Biocomputing (PSB ’04), R.B. Altman, A.K. Dunker, L. Hunter, and T.E. Klein, eds., pp. 399-410, 2004. [18] J. Hartigan, Clustering Algorithms. Wiley, 1975. [19] K. Yeung, D. Haynor, and W. Ruzzo, “Validating Clustering for Gene Expression Data,” Bioinformatics, vol. 17, pp. 309-318, 2001. [20] D.J. Mackay, Information Theory, Inference and Learning Algorithms. Cambridge Univ. Press, 2003. [21] G. McLachlan and D. Peel, Finite Mixture Models. Wiley, 2000. [22] K. Yeung, C. Fraley, A. Murua, A. Raftery, and W. Ruzzo, “Model Based Clustering and Data Transformations for Gene Expression Data,” Bioinformatics, vol. 17, pp. 977-987, 2001. [23] D. Go¨ru¨ r, “Nonparametric Bayesian Discrete Latent Variable Models for Unsupervised Learning,” PhD dissertation, Max Planck Inst. for Biological Cybernetics, 2007. [24] E. Boyle, S. Weng, J. Gollub, H. Jin, D. Botstein, J. Cherry, and G. Sherlock, “Go::Termfinder-Open Source Software for Accessing Gene Ontology Information and Finding Significantly Enriched Gene Ontology Terms Associated with a List of Genes,” Bioinformatics, vol. 20, no. 18, pp. 3710-3715, 2004. [25] M. Viswanathan, G. Muthukumar, Y.S. Cong, and J. Lenard, “Seripauperins of Saccharomyces Cerevisiae: A New Multigene Family Encoding Serine-Poor Relatives of Serine-Rich Proteins,” Gene, vol. 148, no. 1, pp. 149-153, 1994. [26] N. Rachidi, M.J. Martinez, P. Barre, and B. Blondin, “Saccharomyces Cerevisiae PAU Genes Are Induced by Anaerobiosis,” Molecular Microbiology, vol. 35, no. 6, pp. 1421-1430, 2000. [27] F. Klis, A. Boorsma, and P.D. Groot, “Cell Wall Construction in Saccharomyces Cerevisiae,” Yeast, vol. 23, no. 185-202, 2006. [28] U. Jung and D. Levin, “Genome-Wide Analysis of Gene Expression Regulated by the Yeast Cell Wall Integrity Signalling Pathway,” Molecular Microbiology, vol. 34, pp. 1049-1057, 1999. [29] W. McDowell and R. Schwarz, “Dissecting Glycoprotein Biosynthesis by Use of Specific Inhibitors,” Biochimie, vol. 70, pp. 1535-1549, 1998. [30] A. Enyenihi and W. Saunders, “Large-Scale Functional Genomic Analysis of Sporulation and Meiosis in Saccharomyces Cerevisiae,” Genetics, vol. 163, no. 1, pp. 47-54, 2003. [31] M. Schuldiner et al., “Exploration of the Function and Organization of the Yeast Early Secretory Pathway through an Epistatic Miniarray Profile,” Cell, vol. 123, no. 3, pp. 507-519, 2005. [32] A. Boorsma, H. de Nobel, B. ter Riet, B. Bargmann, S. Brul, K. Hellingwerf, and F. Klis, “Characterization of the Transcriptional Response to Cell Wall Stress in Saccharomyces Cerevisiae,” Yeast, vol. 21, pp. 413-427, 2004. [33] M. Kaeberlein, M. McVey, and L. Guarente, “The sir2/3/4 Complex and sir2 Alone Promote Longevity in Saccharomyces Cerevisiae by Two Different Mechanisms,” Genes and Development, vol. 13, pp. 2570-2580, 1999. [34] G. Blander and L. Guarente, “The sir2 Family of Protein Deacetylases,” Ann. Rev. Biochemistry, vol. 73, pp. 417-435, 2004. [35] J. Masson and D. Ramotar, “The Saccharomyces Cerevisiae imp2 Gene Encodes a Transcriptional Activator that Mediates Protection against DNA Damage Caused by Bleomycin and Other Oxidants,” Molecular and Cellular Biology, vol. 16, no. 5, pp. 2091- 2100, 1996. [36] C. Donnini et al., “Imp2, a Nuclear Gene Controlling the Mitochondrial Dependence of Galactose, Maltose and Raffinose Utilization in Saccharomyces Cerevisiae,” Yeast, vol. 8, no. 2, pp. 83-93, 1992. [37] J. Mellor and A. Morillon, “Iswi Complexes in Saccharomyces Cerevisiae,” Biochimica et Biophysica Acta, vol. 1677, nos. 1-3, pp. 100-112, 2004. [38] T. Kataoka et al., “Genetic Analysis of Yeast ras1 and ras2 Genes,” Cell, vol. 37, no. 2, pp. 437-445, 1984. [39] R.L. Smith and A.D. Johnson, “Turning Genes Off by ssn6-tup1: A Conserved System of Transcriptional Repression in Eukaryotes,” Trends in Biochemical Sciences, vol. 25, no. 325-330, 2000. [40] M.K. Kerr and G.A. Churchill, “Bootstrapping Cluster Analysis: Assessing the Reliability of Conclusions from Microarray Experiments,” Proc. Nat’l Academy of Sciences USA, vol. 98, no. 16, pp. 8961-8965, 2001. [41] K. Zhang and H. Zhao, “Assessing Reliability of Gene Clusters from Gene Expression Data,” Functional & Integrative Genomics, vol. 1, pp. 156-173, 2000. [42] N.A. Heard, C.C. Holmes, and D.A. Stephens, “A Quantitative Study of Gene Regulation Involved in the Immune Response of Anopheline Mosquitoes: An Application of Bayesian Hierarchical Clustering of Curves,” J. Am. Statistical Assoc., vol. 101, no. 473, pp. 18-29, 2006. [43] K.A. Heller and Z. Ghahramani, “Bayesian Hierarchical Clustering,” Proc. 22nd Int’l Conf. Machine Learning (ICML), 2005. [44] J.W. Lau and P.J. Green, “Bayesian Model Based Clustering Procedures,” J. Computational and Graphical Statistics, vol. 16, no. 3, pp. 526-558, 2007. [45] G. Bidaut, K. Suhre, J.-M. Claverie, and M. Ochs, “Determination of Strongly Overlapping Signaling Activity from Microarray Data,” BMC Bioinformatics, vol. 7, pp. 99-111, 2006. [46] H. Mewes et al., “Mips: Analysis and Annotation of Proteins from Whole Genomes,” Nucleic Acids Research, vol. 32, pp. D41-D44, 2004.
URI: http://wrap.warwick.ac.uk/id/eprint/2694

Request changes to a record

Actions (login required)

View Item View Item

Document Downloads

More statistics for this item...
twitter

Email us: publications@warwick.ac.uk
Contact Details
About Us