Skip to content Skip to navigation
University of Warwick
  • Study
  • |
  • Research
  • |
  • Business
  • |
  • Alumni
  • |
  • News
  • |
  • About

University of Warwick
Publications service & WRAP

Highlight your research

  • WRAP
    • Home
    • Search WRAP
    • Browse by Warwick Author
    • Browse WRAP by Year
    • Browse WRAP by Subject
    • Browse WRAP by Department
    • Browse WRAP by Funder
    • Browse Theses by Department
  • Publications Service
    • Home
    • Search Publications Service
    • Browse by Warwick Author
    • Browse Publications service by Year
    • Browse Publications service by Subject
    • Browse Publications service by Department
    • Browse Publications service by Funder
  • Statistics
  • Help & Advice
University of Warwick

The Library

  • Login

Partial mixture model for tight clustering of gene expression time-course

Tools
- Tools
+ Tools

Yuan, Yinyin, Li, Chang-Tsun and Wilson, Roland. (2008) Partial mixture model for tight clustering of gene expression time-course. BMC Bioinformatics, Vol.9 (No.287). ISSN 1471-2105

[img]
Preview
PDF
WRAP_Yuan_Partial_Mixture.pdf - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader

Download (909Kb)
Official URL: http://dx.doi.org/10.1186/1471-2105-9-287

Abstract

Background: Tight clustering arose recently from a desire to obtain tighter and potentially more informative clusters in gene expression studies. Scattered genes with relatively loose correlations should be excluded from the clusters. However, in the literature there is little work dedicated to this area of research. On the other hand, there has been extensive use of maximum likelihood techniques for model parameter estimation. By contrast, the minimum distance estimator has been largely ignored. Results: In this paper we show the inherent robustness of the minimum distance estimator that makes it a powerful tool for parameter estimation in model-based time-course clustering. To apply minimum distance estimation, a partial mixture model that can naturally incorporate replicate information and allow scattered genes is formulated. We provide experimental results of simulated data fitting, where the minimum distance estimator demonstrates superior performance to the maximum likelihood estimator. Both biological and statistical validations are conducted on a simulated dataset and two real gene expression datasets. Our proposed partial regression clustering algorithm scores top in Gene Ontology driven evaluation, in comparison with four other popular clustering algorithms. Conclusion: For the first time partial mixture model is successfully extended to time-course data analysis. The robustness of our partial regression clustering algorithm proves the suitability of the ombination of both partial mixture model and minimum distance estimator in this field. We show that tight clustering not only is capable to generate more profound understanding of the dataset under study well in accordance to established biological knowledge, but also presents interesting new hypotheses during interpretation of clustering results. In particular, we provide biological evidences that scattered genes can be relevant and are interesting subjects for study, in contrast to prevailing opinion.

Item Type: Journal Article
Subjects: Q Science > QR Microbiology
Divisions: Faculty of Science > Computer Science
Library of Congress Subject Headings (LCSH): Gene expression
Journal or Publication Title: BMC Bioinformatics
Publisher: BioMed Central Ltd.
ISSN: 1471-2105
Date: 18 June 2008
Volume: Vol.9
Number: No.287
Identification Number: 10.1186/1471-2105-9-287
Status: Peer Reviewed
Access rights to Published version: Restricted or Subscription Access
References: 1. Boutros PC, Okey AB: Unsupervised pattern recognition: An introduction to the whys and wherefores of clustering microarray data. Brief Bioinform 2005, 6(4):331-343. 2. Ji H, Wong WH: Computational Biology: Toward Deciphering Gene Regulatory Information in Mammalian Genomes. Biometrics 2006, 62(19):645-663. 3. Luan Y, Li H: Clustering of time-course gene expression data using a mixed-effects model with B-splines. Bioinformatics 2003, 19(4):474-482. 4. Ng SK, Mclachlan GJ, Wang K, Jones LBT, Ng SW: A Mixture model with random-effects components for clustering correlated gene-expression profiles. Bioinformatics 2006, 22(14):1745-1752. 5. Wu FX, Zhang WJ, Kusalik AJ: Dynamic model-based clustering for time-course gene expression data. J Bioinform Comput Biol 2005, 3(4):821-836. 6. Heard NA, Holmes CC, Stephens DA: A quantitative study of gene regulation involved in the immune response of Anopheline mosquitoes: An application of Bayesian hierarchical clustering of curves. Journal of the American Statistical Association 2006, 101(473):18-29. 7. Yeung KY, Medvedovic M, Bumgarner RE: Clustering gene expression data with repeated measurements. Genome Biology 2003, 4(5):R34. 8. Thalamuthu A, Mukhopadhyay I, Zheng X, Tseng GC: Evaluation and comparison of gene clustering methods in microarray analysis. Bioinformatics 2006, 22(19):2405-2412. 9. Fraley C, Raftery AE: Enhanced Model-Based Clustering, Density Estimation, and Discriminant Analysis Software: MCLUST. Journal of Classification 2003, 20(2):263-286. 10. Wakefield J, Zhou C, Self G: Modelling gene expression data over time: Curve clustering with informative prior distributions. Bayesian Statistics 2003. 11. Fraley C, Raftery AE: How Many Clusters? Which Clustering Method? Answers Via Model-Based Cluster Analysis. The Computer Journal 1998, 41(8):578-588. 12. Beran R: Minimum distance procedures. Handbook of Statistics 1984, 4:741-754. 13. Scott DW: Parametric statistical modeling by minimum integrated square error. Technometrics 2001, 43(3):274-285. 14. Tseng GC, Wong WH: Tight Clustering: A Resampling-Based Approach for Identifying Stable and Tight Patterns in Data. Biometrics 2005, 61:10-16. 15. Bar-Joseph Z, Gerber G, Gifford DK, Jaakkola TS, Simon I: A new approach to analyzing gene expression time series data. Proceedings of the Annual International Conference on Computational Molecular Biology, RECOMB 2002:39-48. 16. Ma P, Castillo-Davis CI, Zhong W, Liu JS: A data-driven clustering method for time course gene expression data. Nucleic Acids Research 2006, 34(4):1261-1269. 17. Tjaden B: An approach for clustering gene expression data with error information. BMC Bioinformatics 2006, 7:17. 18. Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP, Dolinski K, Dwight SS, Eppig JT, Harris MA, Hill DP, Issel- Tarver L, Kasarskis A, Lewis S, Matese JC, Richardson JE, Ringwald M, Rubin GM, Sherlock G: Gene ontology: tool for the unification of biology. Nat Genet 2000, 25:25-29. 19. Parzen E: On the estimation of a probability density function and mode. Annals of Mathematical Statistics 1962, 33:1065-1076. 20. Zacks S: Parametric Statistical Inference Pergamon Press; 1981. 21. Mayoral L: Minimum distance estimation of stationary and non-stationary ARFIMA processes. The Econometrics Journal 2007, 10:124-148. 22. Garcia-Dorado A, Gallego A: Comparing Analysis Methods for Mutation-Accumulation Data: A Simulation Study. Genetics 2003, 164(2):807-819. 23. Parr WC, Schucany WR: Minimum Distance and Robust Estimation. Journal of the American Statistical Association 1980, 75(371):616-624. 24. Wand MP, Jones MC: Kernel Smoothing. Monographs on Statistics and Applied Probability London: Chapman and Hall; 1995. 25. Basu A, Harris I, Hjort N, Jones M: Robust and efficient estimation by minimising a density power divergence. Biometrika 1998, 85:549-559. 26. Yeung K, Fraley C, Murua A, Raftery A, Ruzzo W: Model-based clustering and data transformations for gene expression data. Bioinformatics 2001, 17(10):977-987. 27. Calinski T, Harabasz J: A dendrite method for cluster analysis. Comm Statist 1974, 3:1-27. 28. Hubert L, Arabie P: Comparing partitions. Journal of Classification 1985, 2:193-218. 29. Medvedovic M, Yeung KY, Bumgarner RE: Bayesian mixture model based clustering of replicated microarray data. Bioinformatics 2004, 20(8):1222-1232. 30. Schliep A, Costa IG, Steinhoff C, Schonhuth A: Analyzing gene expression time-courses. IEEE/ACM Trans Comput Biol Bioinform 2005, 2(3):179-193. 31. Dojer N, Gambin A, Mizera A, Wilczynski B, Tiuryn J: Applying dynamic Bayesian networks to perturbed gene expression data. BMC Bioinformatics 2006:7. 32. Jiang D, Pei J, Ramanathan M, Tang C, Zhang A: Mining coherent gene clusters from gene-sample-time microarray data. In KDD '04: Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining New York, NY, USA: ACM Press; 2004:430-439. 33. Qin L, Self SG: The clustering of regression models method with applications in gene expression data. Biometrics 2006, 62(2):526-533. 34. Ernst J, Nau GJ, Bar-Joseph Z: Clustering short time series gene expression data. Bioinformatics 2005, 21(SUPPL. 1):. 35. Cho RJ, Campbell MJ, Winzeler EA, Steinmetz L, Conway A, Wodicka L, Wolfsberg TG, Gabrielian AE, Landsman D, Lockhart DJ, Davis RW: A genome-wide transcriptional analysis of the mitotic cell cycle. Molecular Cell 1998, 2:65-73. 36. Spellman P, Sherlock G, Zhang M, Iyer V, Anders K, Eisen M, Brown P, Botstein D, Futcher B: Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization. Mol Biol Cell 1998, 9(12):3273-97. 37. Yuan Y, Li CT: Unsupervised Clustering of Gene Expression Time Series with Conditional Random Fields. Proceedings of IEEE Workshop on Biomedical Applications for Digital Ecosystems 2007. 38. Fraley C, Raftery A: Model-Based Clustering, Discriminant Analysis, and Density Estimation. Journal of the American Statistical Association 2002, 97(458):611-631. 39. Tavazoie S, Hughes J, Campbell M, Cho R, Church G: Systematic determination of genetic network architecture. Nat Genet 1999, 22(3):281-285. 40. Benjamini Y, Hochberg Y: Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing. Journal of the Royal Statistical Society 1995, B(57):289-300. 41. Fraley C, Raftery AE: MCLUST version 3: an R package for normal mixture modeling and modelbased clustering. Technical Report 504, Department of Statistics, University of Washington, Seattle 2006. 42. Ideker T, Thorsson V, Ranish JA, Christmas R, Buhler J, Eng JK, Bumgarner R, Goodlett DR, Aebersold R, Hood L: Integrated Genomic and Proteomic Analyses of a Systematically Perturbed Metabolic Network. Science 2001, 292(5518):929-934.
URI: http://wrap.warwick.ac.uk/id/eprint/532

Data sourced from Thomson Reuters' Web of Knowledge

Request changes to a record

Actions (login required)

View Item View Item

Document Downloads

More statistics for this item...
twitter

Email us: publications@warwick.ac.uk
Contact Details
About Us