Statistical inference from large-scale genomic data
Yuan, Yinyin (2009) Statistical inference from large-scale genomic data. PhD thesis, University of Warwick.
WRAP_THESIS_Yuan_2009.pdf - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
Official URL: http://webcat.warwick.ac.uk/record=b2260449~S15
This thesis explores the potential of statistical inference methodologies in their applications in functional genomics. In essence, it summarises algorithmic findings in this field, providing step-by-step analytical methodologies for deciphering biological knowledge from large-scale genomic data, mainly microarray gene expression time series.
This thesis covers a range of topics in the investigation of complex multivariate genomic data. One focus involves using clustering as a method of inference and another is cluster validation to extract meaningful biological information from the data. Information gained from the application of these various techniques can then be used conjointly in the elucidation of gene regulatory networks, the ultimate goal of this type of analysis. First, a new tight clustering method for gene expression data is proposed to obtain tighter and potentially more informative gene clusters. Next, to fully utilise biological knowledge in clustering validation, a validity index is defined based on one of the most important ontologies within the Bioinformatics community, Gene Ontology. The method bridges a gap in current literature, in the sense that it takes into account not only the variations of Gene Ontology categories in biological specificities and their significance to the gene clusters, but also the complex structure of the Gene Ontology. Finally, Bayesian probability is applied to making inference from heterogeneous genomic data, integrated with previous efforts in this thesis, for the aim of large-scale gene network inference. The proposed system comes with a stochastic process to achieve robustness to noise, yet remains efficient enough for large-scale analysis.
Ultimately, the solutions presented in this thesis serve as building blocks of an intelligent system for interpreting large-scale genomic data and understanding the functional organisation of the genome.
|Item Type:||Thesis or Dissertation (PhD)|
|Subjects:||Q Science > QA Mathematics > QA76 Electronic computers. Computer science. Computer software|
|Library of Congress Subject Headings (LCSH):||Genomics -- Data processing, Genomics -- Statistical methods, Mathematical statistics -- Data processing, Bioinformatics -- Methodology, Ontology|
|Official Date:||March 2009|
|Institution:||University of Warwick|
|Theses Department:||Department of Computer Science|
|Supervisor(s)/Advisor:||Li, Chang-Tsun ; Wilson, Roland, 1949-|
|Format of File:|
|Extent:||205 leaves : col. ill., charts|
Actions (login required)