
The Library
Bayesian methods and data science with health informatics data
Tools
Peneva, Iliana Stanimirova (2019) Bayesian methods and data science with health informatics data. PhD thesis, University of Warwick.
|
PDF
WRAP_Theses_Peneva_2019.pdf - Submitted Version - Requires a PDF viewer. Download (5Mb) | Preview |
Official URL: http://webcat.warwick.ac.uk/record=b3474575~S15
Abstract
Cancer is a complex disease, driven by a range of genetic and environmental factors. Every year millions of people are diagnosed with a type of cancer and the survival prognosis for many of them is poor due to the lack of understanding of the causes of some cancers. Modern large-scale studies offer a great opportunity to study the mechanisms underlying different types of cancer but also brings the challenges of selecting informative features, estimating the number of cancer subtypes, and providing interpretative results.
In this thesis, we address these challenges by developing efficient clustering algorithms based on Dirichlet process mixture models which can be applied to different data types (continuous, discrete, mixed) and to multiple data sources (in our case, molecular and clinical data) simultaneously. We show how our methodology addresses the drawbacks of widely used clustering methods such as k-means and iClusterPlus. We also introduce a more efficient version of the clustering methods by using simulated annealing in the inference stage.
We apply the data integration methods to data from The Cancer Genome Atlas (TCGA), which include clinical and molecular data about glioblastoma, breast cancer, colorectal cancer, and pancreatic cancer. We find subtypes which are prognostic of the overall survival in two aggressive types of cancer: pancreatic cancer and glioblastoma, which were not identified by the comparison models. We analyse a Hospital Episode Statistics (HES) dataset comprising clinical information about all pancreatic cancer patients in the United Kingdom operated during the period 2001 - 2016. We investigate the effect of centralisation on the short- and long-term survival of the patients, and the factors affecting the patient survival. Our analyses show that higher volume surgery centres are associated with lower 90-day mortality rates and that age, index of multiple deprivation and diagnosis type are significant risk factors for the short-term survival.
Our findings suggest the analysis of large complex molecular datasets coupled with methodology advances can allow us to gain valuable insights in the cancer genome and the associated molecular mechanisms.
Item Type: | Thesis (PhD) | ||||
---|---|---|---|---|---|
Subjects: | Q Science > QA Mathematics R Medicine > R Medicine (General) R Medicine > RC Internal medicine |
||||
Library of Congress Subject Headings (LCSH): | Bayesian statistical decision theory, Medical informatics -- Data processing, Cancer -- Research | ||||
Official Date: | January 2019 | ||||
Dates: |
|
||||
Institution: | University of Warwick | ||||
Theses Department: | Mathematics for Real-World Systems Centre for Doctoral Training | ||||
Thesis Type: | PhD | ||||
Publication Status: | Unpublished | ||||
Supervisor(s)/Advisor: | Savage, Richard S. ; Roberts, Keith ; Evison, Felicity ; Moss, Paul | ||||
Extent: | xxiv, 252 leaves : illustrations. | ||||
Language: | eng |
Request changes or add full text files to a record
Repository staff actions (login required)
![]() |
View Item |
Downloads
Downloads per month over past year