The Library
A temporal precedence based clustering method for gene expression microarray data
Tools
Krishna, Ritesh V., Li, Chang-Tsun and Buchanan-Wollaston, Vicky. (2010) A temporal precedence based clustering method for gene expression microarray data. BMC Bioinformatics, Vol.11 . Article 68. ISSN 1471-2105
|
PDF
WRAP_Buchanon_wollaston_Temporal_precedence.pdf - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader Download (6Mb) |
Official URL: http://dx.doi.org/10.1186/1471-2105-11-68
Abstract
Background: Time-course microarray experiments can produce useful data which can help in understanding the underlying dynamics of the system. Clustering is an important stage in microarray data analysis where the data is grouped together according to certain characteristics. The majority of clustering techniques are based on distance or visual similarity measures which may not be suitable for clustering of temporal microarray data where the sequential nature of time is important. We present a Granger causality based technique to cluster temporal microarray gene expression data, which measures the interdependence between two time-series by statistically testing if one time-series can be used for forecasting the other time-series or not. Results: A gene-association matrix is constructed by testing temporal relationships between pairs of genes using the Granger causality test. The association matrix is further analyzed using a graph-theoretic technique to detect highly connected components representing interesting biological modules. We test our approach on synthesized datasets and real biological datasets obtained for Arabidopsis thaliana. We show the effectiveness of our approach by analyzing the results using the existing biological literature. We also report interesting structural properties of the association network commonly desired in any biological system. Conclusions: Our experiments on synthesized and real microarray datasets show that our approach produces encouraging results. The method is simple in implementation and is statistically traceable at each step. The method can produce sets of functionally related genes which can be further used for reverse-engineering of gene circuits.
| Item Type: | Journal Article |
|---|---|
| Subjects: | Q Science > QK Botany Q Science > QH Natural history > QH426 Genetics |
| Divisions: | Faculty of Science > Computer Science Faculty of Science > Life Sciences (2010- ) > Warwick HRI (2004-2010) |
| Library of Congress Subject Headings (LCSH): | Gene expression -- Research, DNA microarrays -- Research, Bioinformatics, Arabidopsis thaliana -- Genetics |
| Journal or Publication Title: | BMC Bioinformatics |
| Publisher: | BioMed Central Ltd. |
| ISSN: | 1471-2105 |
| Date: | 30 January 2010 |
| Volume: | Vol.11 |
| Page Range: | Article 68 |
| Identification Number: | 10.1186/1471-2105-11-68 |
| Status: | Peer Reviewed |
| Access rights to Published version: | Open Access |
| References: | 1. Kim BR, Littell RC, Wu RL: Clustering the periodic pattern of gene expression using Fourier series approximations. Curr Genomics 2006, 7:197-203. 2. Harmer SL, Hogenesch JB, Straume M, Chang HS, HB , et al: Orchestrated transcription of key pathways in Arabidopsis by the circadian clock. Science 2000, 290:2110-2113. 3. Wichert S, Fokianos K, Strimmer K: Identifying Periodically Expressed Transcripts in Microarray Time Series Data. Bioinformatics 2004, 20:5-20. 4. Quackenbush J: Computational analysis of microarray data. Nat Rev Genet 2001, 2(6):418-427. 5. Speed T: Statistical Analysis of Gene Expression Microarray Data Chapman and Hall/CRC 2003. 6. Kerr MK, Churchill GA: Statistical design and the analysis of gene expression microarray data. Genet Res 2001, 77:123-128. 7. Androulakis IP, Yang E, Almon RR: Analysis of Time-Series Gene Expression Data: Methods, Challenges and Opportunities. Annual Review of Biomedical Engineering 2007, 9:205-228. 8. Granger C: Investigating causal relations by econometric models and cross-spectral methods. Econometrica 1969, 37:424-438. 9. Mukhopadhyay N, Chatterjee S: Causality and pathway search in microarray time series experiment. Bioinformatics 2007, 23:442-449. 10. Nagarajan R, Upreti M: Comment on causality and pathway search in microarray time series experiment. Bioinformatics 2008, 24(7):1029-1032. 11. Krishna R, Guo S: A partial granger causality approach to explore causal networks derived from multi-parameter data. Lecture notes in Computer Science 2008, 5307:9-27. 12. Guo S, Wu JH, Ding MZ, Feng JF: Uncovering interactions in the frequency domain. PLoS Comp Biology 2008, 4(5):e1000087. 13. Jeong H, Mason SP, Barabsi AL, Oltvai ZN: Lethality and centrality in protein networks. Nature 2001, 411:41-42. 14. Tanay A, Sharan R, Kupiec M, Shamir R: Revealing modularity and organization in the yeast molecular network by integrated analysis of highly heterogeneous genomewide data. PNAS 2004, 101:2981-2986. 15. Barabsi A: Linked: The New Science of Networks Basic Books 2002. 16. DHaeseleer P: How does gene expression clustering work?. Nat Biotechnol 2005, 23(12):1499-1501. 17. Seber GAF: Multivariate Observations John Wiley & Sons Inc 1984. 18. Eichler G, Huang S, Ingber D: Gene expression dynamics inspector (GEDI): for integrative analysis of expression profiles. Bioinformatics 2003, 19(17):2321-22. 19. Johnson R, Wichern D: Applied multivariate statistical analysis Prentice-Hall 1988. 20. Eisen M, Spellman P, Brown P, Botstein D: Cluster analysis and display of genome-wide expression patterns. PNAS 1998, 95(25):14863-68. 21. Gasch A, Spellman P, Kao C, Carmel-Harel O, Eisen Mea: Genomic expression programs in the response of yeast cells to environmental changes. Mol Biol Cell 2000, 11(12):4241-57. 22. Tavazoie S, Hughes J, Campbell M, Cho R, Church G: Systematic determination of genetic network architecture. Nat Genet 1999, 22(3):281-85. 23. Ji L, Tan KL: Identifying time-lagged gene clusters using gene expression data. Bioinformatics 2005, 21(4):509-516. 24. Chen T, Filkov V, Skiena S, (Eds): Identifying gene regulatory networks from experimental data 1999. 25. Kwon A, Hoos H, Ng R: Inference of transcriptional regulation relationships from gene expression data. Bioinformatics 2003, 19:905-912. 26. Balasubramaniyan R, Hullermeier E, Weskamp N, Kamper J: Clustering of gene expression data using a local shape-based similarity measure. Bioinformatics 2005, 21(7):1069-77. 27. Ernst J, Bar-Joseph Z: STEM: a tool for the analysis of short time series gene expression data. BMC Bioinformatics 2006, 7(1):191. 28. Yeung L, Szeto L, Liew A, Yan H: Dominant spectral component analysis for transcriptional regulations using microarray time-series data. Bioinformatics 2004, 20:742-749. 29. Ng A, Jordan M, Weiss Y: On spectral clustering: Analysis and an algorithm. Advances in Neural Information Processing Systems 2002, 14. 30. Gower JC, Ross GJ: Minimum spanning trees and single linkage analysis. Appl Stat 1969, 18:54-64. 31. Xu Y, Olman V, Xu D: Clustering gene expression data using a graphtheoretic approach: an application of minimum spanning trees. Bioinformatics 2002, 18(4):536-45. 32. McLachlan GJ, Bean RW, Peel D: A mixture model-based approach to the clustering of microarray expression data. Bioinformatics 2002, 18:413-422. 33. Ng S, McLachlan GJ, Wang K, Jones LBT, Ng SW: A mixture model with random-effects components for clustering correlated gene-expression profiles. Bioinformatics 2006, 22:1745-1752. 34. Yuan Y, Li CT, Wilson R: Partial mixture model for tight clustering of gene expression time-course. BMC Bioinformatics 2008, 9:287. 35. Pan W, Lin J, Le CT: Model-based cluster analysis of microarray geneexpression data. Genome Biol 2002, 3(2):RESEARCH0009. 36. Dempster A, Laird N, Rubin D: Maximum likelihood from incomplete data via the EM algorithm. J Royal Stat Soc 1977, B-39:1-38. 37. Schliep A, Schonhuth A, Steinhoff C: Using hidden Markov models to analyze gene expression time course data. Bioinformatics 2003, 19:264-272. 38. Schliep A, Costa I, Steinhoff C, Schonhuth A: Analyzing Gene Expression Time-Courses. IEEE/ACM Transactions on computational biology and bioinformatics 2005, 2(3):179-193. 39. Ramoni PMF, Sebastiani , Kohane I: Cluster analysis of gene expression dynamics. PNAS 2002, 99:9121-9126. 40. Bar-Joseph Z, Gerber G, Jaakkola T, Gifford D, Simon I: Continuous representations of time series gene expression data. J Comput Biol 2003, 3(4):341-356. 41. Zhao L, Prentice R, Breeden L: Statistical modeling of large microarray data sets to identify stimulus response profiles. PNAS 2001, 98:5631-5636. 42. Lu X, Zhang W, Qin Z, Kwast K, Liu J: Statistical resynchronization and Bayesian detection of periodically expressed genes. Nucleic Acids Res 2004, 32:447-455. 43. Moller-Levet C, Chu K, Wolkenhauer O: DNA microarray data clustering based on temporal variation: Fcv with tsd preclustering. Appl Bioinformatics 2003, 2:35-45. 44. Lim PO, Kim Y, Breeze E, Koo JC, Woo HR, Ryu JS, Park DH, Beynon J, Tabrett A, Buchanan-Wollaston V, Nam HG: Overexpression of a chromatin architecture-controlling AT-hook protein extends leaf longevity and increases the post-harvest storage life of plants. The Plant Journal 2007, 52:1140-1153. 45. Gene Ontology: tool for the unification of biology. Nature Genet 2000, 25:25-29. 46. Maere S, Heymans K, Kuiper M: BiNGO: a Cytoscape plugin to assess overrepresentation of Gene Ontology categories in biological networks. Bioinformatics 2005, 21:3448-3449. 47. Benjamini Y, Hochberg Y: Controlling the false discovery rate: A practical and powerful approach to multiple testing. J R Stat Soc 1995, B 57:289-300. 48. Feng JF, Yi D, Krishna R, Guo S, Buchanan-Wollaston V: Listen to Genes: Dealing with Microarray Data in the Frequency Domain. PLos ONE 2009, 4(4):e5098. 49. Barabsi AL, Albert R: Emergence of scaling in random networks. Science 1999, 286:509-512. 50. Watts DJ, Strogatz SH: Collective dynamics of ‘small-world’ networks. Nature 1998, 393:440-442. 51. Goldberg DS, Roth FP: Assessing experimentally derived interactions in a small world. PNAS 2003, 100:4372-4376. 52. Ravasz E, Somera AL, Mongru DA, Oltvai ZN, Barabasi AL: Hierarchical organization of modularity in metabolic networks. Science 2002, 297:1551-1555. 53. Angelini C, Cutillo L, De Canditiis D, Mutarelli M, Pensky M: BATS: a Bayesian user-friendly software for Analyzing Time Series microarray experiments. BMC Bioinformatics 2008, 9(1):415. 54. Ancona N, Marinazzo D, Stramaglia S: Radial basis function approach to nonlinear Granger causality of time series. Physical Review E 2004, 70:056221. 55. Marinazzo D, Pellicoro M, Stramaglia S: Nonlinear parametric model for Granger causality of time series. Physical Review E 2006, 73:066216. 56. Pihur V, Datta S, Datta S: Reconstruction of genetic association networks from microarray data: a partial least squares approach. Bioinformatics 2008, 24(4):561-568. 57. Schafer J, Strimmer K: An empirical Bayes approach to inferring largescale gene association networks. Bioinformatics 2005, 21(6):754-764. 58. Granger C, Newbold P: Forecasting Economic Time Series Academic Press 1986. 59. Schwert GW: Tests of causality: The message in the innovations. Carnegie- Rochester Conference Series on Public Policy 1979, 10(1):55-96. 60. Akaike H: Fitting autoregressive models for regression. Annals of the Institute of Statistical Mathematics 1969, 21:243-247. 61. Shannon P, Markiel A, Ozier O, Baliga NS, Wang JT, Ramage D, Amin N, Schwikowski B, Ideker T: Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Research 2003, 13(11):2498-504. 62. Xenarios I, Salwinski L, Duan XJ, Higney P, Kim SM, Eisenberg D: DIP, the Database of Interacting Proteins: a research tool for studying cellular networks of protein interactions. Nucleic Acids Res 2002, 30:303-305. 63. Dehmer M, Emmert-Streib F, (Eds): Analysis of Microarray Data: A Network- Based Approach Wiley-VCH 2008. 64. Goldberg A: Finding a Maximum Density Subgraph. Tech rep, EECS Department, University of California, Berkeley 1984. 65. Bader G, Hogue C: An automated method for finding molecular complexes in large protein interaction networks. BMC Bioinformatics 2003, 4(2). |
| URI: | http://wrap.warwick.ac.uk/id/eprint/2993 |
Data sourced from Thomson Reuters' Web of Knowledge
Actions (login required)
![]() |
View Item |
Tools
Tools

