Partial mixture model for tight clustering of gene expression time-course
Yuan, Yinyin, Li, Chang-Tsun and Wilson, Roland (2008) Partial mixture model for tight clustering of gene expression time-course. BMC Bioinformatics, Vol.9 (No.287). ISSN 1471-2105 Official URL: http://dx.doi.org/10.1186/1471-2105-9-287 AbstractBackground: Tight clustering arose recently from a desire to obtain tighter and potentially more informative clusters in gene expression studies. Scattered genes with relatively loose correlations should be excluded from the clusters. However, in the literature there is little work dedicated to
this area of research. On the other hand, there has been extensive use of maximum likelihood techniques for model parameter estimation. By contrast, the minimum distance estimator has been largely ignored.
Results: In this paper we show the inherent robustness of the minimum distance estimator that makes it a powerful tool for parameter estimation in model-based time-course clustering. To apply minimum distance estimation, a partial mixture model that can naturally incorporate replicate
information and allow scattered genes is formulated. We provide experimental results of simulated data fitting, where the minimum distance estimator demonstrates superior performance to the maximum likelihood estimator. Both biological and statistical validations are conducted on a
simulated dataset and two real gene expression datasets. Our proposed partial regression clustering algorithm scores top in Gene Ontology driven evaluation, in comparison with four other popular clustering algorithms.
Conclusion: For the first time partial mixture model is successfully extended to time-course data analysis. The robustness of our partial regression clustering algorithm proves the suitability of the ombination of both partial mixture model and minimum distance estimator in this field. We show that tight clustering not only is capable to generate more profound understanding of the dataset
under study well in accordance to established biological knowledge, but also presents interesting new hypotheses during interpretation of clustering results. In particular, we provide biological evidences that scattered genes can be relevant and are interesting subjects for study, in contrast to prevailing opinion. | Item Type: | Journal Article |
|---|
| Subjects: | Q Science > QR Microbiology |
|---|
| Divisions: | Faculty of Science > Computer Science |
|---|
| Library of Congress Subject Headings (LCSH): | Gene expression |
|---|
| Journal or Publication Title: | BMC Bioinformatics |
|---|
| Publisher: | BioMed Central Ltd. |
|---|
| ISSN: | 1471-2105 |
|---|
| Date: | 18 June 2008 |
|---|
| Volume: | Vol.9 |
|---|
| Number: | No.287 |
|---|
| Status: | Peer Reviewed |
|---|
| Access rights to Published version: | Restricted or Subscription Access |
|---|
| References: | 1. Boutros PC, Okey AB: Unsupervised pattern recognition: An
introduction to the whys and wherefores of clustering microarray
data. Brief Bioinform 2005, 6(4):331-343.
2. Ji H, Wong WH: Computational Biology: Toward Deciphering
Gene Regulatory Information in Mammalian Genomes. Biometrics
2006, 62(19):645-663.
3. Luan Y, Li H: Clustering of time-course gene expression data
using a mixed-effects model with B-splines. Bioinformatics 2003,
19(4):474-482.
4. Ng SK, Mclachlan GJ, Wang K, Jones LBT, Ng SW: A Mixture model
with random-effects components for clustering correlated
gene-expression profiles. Bioinformatics 2006, 22(14):1745-1752.
5. Wu FX, Zhang WJ, Kusalik AJ: Dynamic model-based clustering
for time-course gene expression data. J Bioinform Comput Biol
2005, 3(4):821-836.
6. Heard NA, Holmes CC, Stephens DA: A quantitative study of
gene regulation involved in the immune response of
Anopheline mosquitoes: An application of Bayesian hierarchical
clustering of curves. Journal of the American Statistical Association
2006, 101(473):18-29.
7. Yeung KY, Medvedovic M, Bumgarner RE: Clustering gene expression
data with repeated measurements. Genome Biology 2003,
4(5):R34.
8. Thalamuthu A, Mukhopadhyay I, Zheng X, Tseng GC: Evaluation
and comparison of gene clustering methods in microarray
analysis. Bioinformatics 2006, 22(19):2405-2412.
9. Fraley C, Raftery AE: Enhanced Model-Based Clustering, Density
Estimation, and Discriminant Analysis Software:
MCLUST. Journal of Classification 2003, 20(2):263-286.
10. Wakefield J, Zhou C, Self G: Modelling gene expression data
over time: Curve clustering with informative prior distributions.
Bayesian Statistics 2003.
11. Fraley C, Raftery AE: How Many Clusters? Which Clustering
Method? Answers Via Model-Based Cluster Analysis. The
Computer Journal 1998, 41(8):578-588.
12. Beran R: Minimum distance procedures. Handbook of Statistics
1984, 4:741-754.
13. Scott DW: Parametric statistical modeling by minimum integrated
square error. Technometrics 2001, 43(3):274-285.
14. Tseng GC, Wong WH: Tight Clustering: A Resampling-Based
Approach for Identifying Stable and Tight Patterns in Data.
Biometrics 2005, 61:10-16.
15. Bar-Joseph Z, Gerber G, Gifford DK, Jaakkola TS, Simon I: A new
approach to analyzing gene expression time series data. Proceedings
of the Annual International Conference on Computational Molecular
Biology, RECOMB 2002:39-48.
16. Ma P, Castillo-Davis CI, Zhong W, Liu JS: A data-driven clustering
method for time course gene expression data. Nucleic Acids
Research 2006, 34(4):1261-1269.
17. Tjaden B: An approach for clustering gene expression data
with error information. BMC Bioinformatics 2006, 7:17.
18. Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM,
Davis AP, Dolinski K, Dwight SS, Eppig JT, Harris MA, Hill DP, Issel-
Tarver L, Kasarskis A, Lewis S, Matese JC, Richardson JE, Ringwald M,
Rubin GM, Sherlock G: Gene ontology: tool for the unification
of biology. Nat Genet 2000, 25:25-29.
19. Parzen E: On the estimation of a probability density function
and mode. Annals of Mathematical Statistics 1962, 33:1065-1076.
20. Zacks S: Parametric Statistical Inference Pergamon Press; 1981.
21. Mayoral L: Minimum distance estimation of stationary and
non-stationary ARFIMA processes. The Econometrics Journal
2007, 10:124-148.
22. Garcia-Dorado A, Gallego A: Comparing Analysis Methods for
Mutation-Accumulation Data: A Simulation Study. Genetics
2003, 164(2):807-819.
23. Parr WC, Schucany WR: Minimum Distance and Robust Estimation.
Journal of the American Statistical Association 1980,
75(371):616-624.
24. Wand MP, Jones MC: Kernel Smoothing. Monographs on Statistics and
Applied Probability London: Chapman and Hall; 1995.
25. Basu A, Harris I, Hjort N, Jones M: Robust and efficient estimation
by minimising a density power divergence. Biometrika
1998, 85:549-559.
26. Yeung K, Fraley C, Murua A, Raftery A, Ruzzo W: Model-based
clustering and data transformations for gene expression
data. Bioinformatics 2001, 17(10):977-987.
27. Calinski T, Harabasz J: A dendrite method for cluster analysis.
Comm Statist 1974, 3:1-27.
28. Hubert L, Arabie P: Comparing partitions. Journal of Classification
1985, 2:193-218.
29. Medvedovic M, Yeung KY, Bumgarner RE: Bayesian mixture
model based clustering of replicated microarray data. Bioinformatics
2004, 20(8):1222-1232.
30. Schliep A, Costa IG, Steinhoff C, Schonhuth A: Analyzing gene
expression time-courses. IEEE/ACM Trans Comput Biol Bioinform
2005, 2(3):179-193.
31. Dojer N, Gambin A, Mizera A, Wilczynski B, Tiuryn J: Applying
dynamic Bayesian networks to perturbed gene expression
data. BMC Bioinformatics 2006:7.
32. Jiang D, Pei J, Ramanathan M, Tang C, Zhang A: Mining coherent
gene clusters from gene-sample-time microarray data. In
KDD '04: Proceedings of the tenth ACM SIGKDD international conference
on Knowledge discovery and data mining New York, NY, USA: ACM
Press; 2004:430-439.
33. Qin L, Self SG: The clustering of regression models method
with applications in gene expression data. Biometrics 2006,
62(2):526-533.
34. Ernst J, Nau GJ, Bar-Joseph Z: Clustering short time series gene
expression data. Bioinformatics 2005, 21(SUPPL. 1):.
35. Cho RJ, Campbell MJ, Winzeler EA, Steinmetz L, Conway A, Wodicka
L, Wolfsberg TG, Gabrielian AE, Landsman D, Lockhart DJ, Davis
RW: A genome-wide transcriptional analysis of the mitotic
cell cycle. Molecular Cell 1998, 2:65-73.
36. Spellman P, Sherlock G, Zhang M, Iyer V, Anders K, Eisen M, Brown
P, Botstein D, Futcher B: Comprehensive identification of cell
cycle-regulated genes of the yeast Saccharomyces cerevisiae
by microarray hybridization. Mol Biol Cell 1998, 9(12):3273-97.
37. Yuan Y, Li CT: Unsupervised Clustering of Gene Expression
Time Series with Conditional Random Fields. Proceedings of
IEEE Workshop on Biomedical Applications for Digital Ecosystems 2007.
38. Fraley C, Raftery A: Model-Based Clustering, Discriminant
Analysis, and Density Estimation. Journal of the American Statistical
Association 2002, 97(458):611-631.
39. Tavazoie S, Hughes J, Campbell M, Cho R, Church G: Systematic
determination of genetic network architecture. Nat Genet
1999, 22(3):281-285.
40. Benjamini Y, Hochberg Y: Controlling the False Discovery Rate:
A Practical and Powerful Approach to Multiple Testing. Journal
of the Royal Statistical Society 1995, B(57):289-300.
41. Fraley C, Raftery AE: MCLUST version 3: an R package for normal
mixture modeling and modelbased clustering. Technical
Report 504, Department of Statistics, University of Washington, Seattle
2006.
42. Ideker T, Thorsson V, Ranish JA, Christmas R, Buhler J, Eng JK, Bumgarner
R, Goodlett DR, Aebersold R, Hood L: Integrated
Genomic and Proteomic Analyses of a Systematically Perturbed
Metabolic Network. Science 2001, 292(5518):929-934. |
|---|
Data sourced from Thomson Reuters' Web of Knowledge Request changes to a record Repository Staff Only: item control page
|