The Library
Patient-specific data fusion defines prognostic cancer subtypes
Tools
Yuan, Yinyin, Savage, Richard S. and Markowetz, Florian. (2011) Patient-specific data fusion defines prognostic cancer subtypes. PLoS Computational Biology, Vol.7 (No.10). e1002227. ISSN 1553-7358
|
PDF
WRAP_Savage_journal.pcbi.1002227.pdf - Published Version - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader Download (2214Kb) |
Official URL: http://dx.doi.org/10.1371/journal.pcbi.1002227
Abstract
Different data types can offer complementary perspectives on the same biological phenomenon. In cancer studies, for example, data on copy number alterations indicate losses and amplifications of genomic regions in tumours, while transcriptomic data point to the impact of genomic and environmental events on the internal wiring of the cell. Fusing different data provides a more comprehensive model of the cancer cell than that offered by any single type. However, biological signals in different patients exhibit diverse degrees of concordance due to cancer heterogeneity and inherent noise in the measurements. This is a particularly important issue in cancer subtype discovery, where personalised strategies to guide therapy are of vital importance. We present a nonparametric Bayesian model for discovering prognostic cancer subtypes by integrating gene expression and copy number variation data. Our model is constructed from a hierarchy of Dirichlet Processes and addresses three key challenges in data fusion: (i) To separate concordant from discordant signals, (ii) to select informative features, (iii) to estimate the number of disease subtypes. Concordance of signals is assessed individually for each patient, giving us an additional level of insight into the underlying disease structure. We exemplify the power of our model in prostate cancer and breast cancer and show that it outperforms competing methods. In the prostate cancer data, we identify an entirely new subtype with extremely poor survival outcome and show how other analyses fail to detect it. In the breast cancer data, we find subtypes with superior prognostic value by using the concordant results. These discoveries were crucially dependent on our model’s ability to distinguish concordant and discordant signals within each patient sample, and would otherwise have been missed. We therefore demonstrate the importance of taking a patientspecific approach, using highly-flexible nonparametric Bayesian methods.
| Item Type: | Journal Article |
|---|---|
| Subjects: | Q Science > QA Mathematics R Medicine > R Medicine (General) |
| Divisions: | Faculty of Science > Centre for Systems Biology |
| Library of Congress Subject Headings (LCSH): | Cancer -- Prognosis -- Mathematical models, Cancer -- Prognosis -- Data processing |
| Journal or Publication Title: | PLoS Computational Biology |
| Publisher: | Public Library of Science |
| ISSN: | 1553-7358 |
| Date: | 20 October 2011 |
| Volume: | Vol.7 |
| Number: | No.10 |
| Page Range: | e1002227 |
| Identification Number: | 10.1371/journal.pcbi.1002227 |
| Status: | Peer Reviewed |
| Publication Status: | Published |
| Access rights to Published version: | Open Access |
| Funder: | University of Cambridge, Cancer Research UK (CRUK), Hutchison Whampoa Ltd., Medical Research Council (Great Britain) (MRC) |
| References: | 1. Perou CM, Børresen-Dale AL (2010) Systems biology and genomics of breast cancer. Cold Spring Harb Perspect Biol 3: 2. 2. Sorlie T, Tibshirani R, Parker J, Hastie T, Marron JS, et al. (2003) Repeated observation of breast tumor subtypes in independent gene expression data sets. Proc Natl Acad Sci U S A 100: 8418–23. 3. Furge KA, Lucas KA, Takahashi M, Sugimura J, Kort EJ, et al. (2004) Robust classification of renal cell carcinoma based on gene expression data and predicted cytogenetic profiles. Cancer Res 64: 4117–4121. 4. Alizadeh AA, Eisen MB, Davis RE, Ma C, Lossos IS, et al. (2000) Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling. Nature 403: 503–511. 5. Segal E, Friedman N, Koller D, Regev A (2004) A module map showing conditional activity of expression modules in cancer. Nat Genet 36: 1090–1098. 6. Hummel M, Bentink S, Berger H, Klapper W, Wessendorf S, et al. (2006) A biologic definition of burkitt’s lymphoma from transcriptional and genomic profiling. N Engl J Med 354: 2419–2430. 7. Taylor BS, Schultz N, Hieronymus H, Gopalan A, Xiao Y, et al. (2010) Integrative genomic profiling of human prostate cancer. Cancer Cell 18: 11–22. 8. Shen R, Olshen AB, Ladanyi M (2009) Integrative clustering of multiple genomic data types using a joint latent variable model with application to breast and lung cancer subtype analysis. Bioinformatics 25: 2906–2912. 9. Smolkin M, Ghosh D (2003) Cluster stability scores for microarray data in cancer studies. BMC Bioinformatics 4: 36. 10. Antoniak C (1974) Mixtures of Dirichlet processes with applications to Bayesian nonparametric problems. Ann Stat 2: 1152–1174. 11. Ferguson T (1973) A Bayesian analysis of some nonparametric problems. Ann Stat 1: 209–230. 12. Savage RS, Ghahramani Z, Griffin JE, de la Cruz B, et al. (2010) Discovering transcriptional modules by bayesian data integration. Bioinformatics 26: 158–167. 13. Kundaje A, Middendorf M, Gao F, Wiggins C, Leslie C (2005) Combining sequence and time series expression data to learn transcriptional modules. IEEE/ACM Trans Comput Biol Bioinform 2: 194–202. 14. Berger JA, Hautaniemi S, Mitra SK, Astola J (2006) Jointly analyzing gene expression and copy number data in breast cancer using data reduction models. IEEE/ACM Trans Comput Biol Bioinform 3: 2–16. 15. Chin S, Teschendorff A, Marioni J, Wang Y, Barbosa-Morais N, et al. (2007) High-resolution acgh and expression profiling identifies a novel genomic subtype of er negative breast cancer. Genome Biol 8: R215. 16. Chin K, Devries S, Fridlyand J, Spellman PT, Roydasgupta R, et al. (2006) Genomic and transcriptional aberrations linked to breast cancer pathophysiologies. Cancer Cell 10: 529–541. 17. Jiang M, Li M, Fu X, Huang Y, Qian H, et al. (2008) Simultaneously detection of genomic and expression alterations in prostate cancer using cdna microarray. Prostate 68: 1496–509. 18. Rasmussen CE (2000) The infinite Gaussian mixture model. In: Proceedings of Advances in Neural InformationProcessing Systems 12. Cambridge (Massachusetts): MIT Press. pp 554–560. 19. Wild D, Rasmussen C, Ghahramani Z, Cregg J, de la Cruz BJ, et al. (2002) A Bayesian approach to modeling uncertainty in gene expression clusters. In: Proceedings of 3rd International Conference on Systems Biology, Sweden. 20. Medvedovic M, Sivaganesan S (2002) Bayesian infinite mixture model based clustering of gene expression profiles. Bioinformatics 18: 1194–1206. 21. Medvedovic M, Yeung KY, Bumgarner RE (2004) Bayesian mixture model based clustering of replicated microarray data. Bioinformatics 20: 1222–1232. 22. Liu X, Sivaganesan S, Yeung KY, Guo J, Bumgarner RE, et al. (2006) Contextspecific infinite mixtures for clustering gene expression profiles across diverse microarray dataset. Bioinformatics 22: 1737–1744. 23. Dahl D (2006) Model-based clustering for expression data via a Dirichlet process mixture model. In:, , Kim- Anh Do MVE Peter Mu¨ ller, editor (2006) Bayesian Inference for Gene Expression and Proteomics. Cambridge: Cambridge University Press. 24. Qin ZS (2006) Clustering microarray gene expression data using weighted Chinese restaurant process. Bioinformatics 22: 1988–1997. 25. Rasmussen C, de la Cruz B, Ghahramani Z, Wild DL (2007) Modeling and visualizing uncertainty in gene expression clusters using Dirichlet process mixtures. IEEE/ACM Trans Comput Biol Bioinform 6: 615–628. 26. van de Wiel MA, van Wieringen WN (2007) Cghregions: Dimension reduction for array cgh data with minimal information loss. Cancer informatics 3: 55–63. 27. Smyth GK (2005) Limma: linear models for microarray data. In: Bioinformatics and Computational Biology Solutions using R and Bioconductor. New York: Springer. pp 397–420. 28. Prasad, Goel R, Kandasamy K, Keerthikumar S, Kumar S, et al. (2009) Human Protein Reference Database–2009 update. Nucleic Acids Res 37: D767–72. 29. Beisser D, Klau GW, Dandekar T, Mu¨ ller T, Dittrich MT (2010) BioNet: an RPackage for the functional analysis of biological networks. Bioinformatics 26: 1129–1130. 30. Sieuwerts AM, Look MP, Meijer-van Gelder ME, Timmermans M, Trapman AM, et al. (2006) Which cyclin e prevails as prognostic marker for breast cancer? results from a retrospective study involving 635 lymph node negative breast cancer patients. Clin Cancer Res 12: 3319–3328. 31. Frescas D, Pagano M (2008) Deregulated proteolysis by the F-box proteins SKP2 and TrCP: tipping the scales of cancer. Nat Rev Cancer 8: 438–449. 32. Langerod A, Zhao H, Borgan O, Nesland J, Bukholm I, et al. (2007) Tp53 mutation status and gene expression profiles are powerful prognostic markers of breast cancer. Breast Cancer Res 9: R30. 33. Kanehisa M, Araki M, Goto S, Hattori M, Hirakawa M, et al. (2008) KEGG for linking genomes to life and the environment. Nucleic Acids Res 36: D480–484. 34. Merico D, Isserlin R, Stueker O, Emili A, Bader GD (2010) Enrichment map: A network-based method for gene-set enrichment visualization and interpretation. PLoS ONE 5: e13984. 35. Wang X, Terfve C, Rose JC, Markowetz F (2011) HTSanalyzeR: a R/ Bioconductor package for integrated network analysis of high-throughput screens. Bioinformatics 27: 879–880. 36. Ertel A, Verghese A, Byers SW, Ochs M, Tozeren A (2006) Pathway-specific differences between tumor cell lines and normal and tumor tissue cells. Mol Cancer 5: 55. 37. Miecznikowski J, Wang D, Liu S, Sucheston L, Gold D (2010) Comparative survival analysis of breast cancer microarray studies identifies important prognostic genetic pathways. BMC Cancer 10: 573. 38. Rubin JB (2009) Chemokine signaling in cancer: One hump or two? Semin Cancer Biol 19: 116–122. 39. Hembruff SL, Cheng N (2009) Chemokine signaling in cancer: Implications on the tumor microenvironment and therapeutic targeting. Cancer Ther 7: 254–267. 40. Thurn KT, Arora H, Paunesku T, Wu A, Brown EMB, et al. (2011) Endocytosis of titanium dioxide nanoparticles in prostate cancer pc-3m cells. Nanomedicine 7: 123–30. 41. Polo S, Pece S, Di Fiore PP (2004) Endocytosis and cancer. Curr Opin Cell Biol 16: 156–61. 42. Zheng C, Ren Z, Wang H, Zhang W, Kalvakolanu DV, et al. (2009) E2f1 induces tumor cell survival via nuclear factor-kappab-dependent induction of egr1 transcription in prostate cancer cells. Cancer Res 69: 2324–31. 43. van deWiel M, Kim K, Vosse S, vanWieringen W, Wilting S, et al. (2007) CGHcall: calling aberrations for array CGH tumor profiles. Bioinformatics 23: 892–894. 44. Geier F, Timmer J, Fleck C (2007) Reconstructing gene-regulatory networks from time series, knock-out data, and prior knowledge. BMC Syst Biol 1: 11. 45. Warnat P, Eils R, Brors B (2005) Cross-platform analysis of cancer microarray data improves gene expression based classification of phenotypes. BMC Bioinformatics 6: 265. 46. Bicciato S, Spinelli R, Zampieri M, Mangano E, Ferrari F, et al. (2009) A computational procedure to identify significant overlap of differentially expressed and genomic imbalanced regions in cancer datasets. Nucleic Acids Res 37: 5057–70. |
| URI: | http://wrap.warwick.ac.uk/id/eprint/39088 |
Actions (login required)
![]() |
View Item |
Tools
Tools

