Informative sequence-based models for fragment distributions in ChIP-seq, RNA-seq and ChIP-chip data
Dyer, Nigel (2011) Informative sequence-based models for fragment distributions in ChIP-seq, RNA-seq and ChIP-chip data. PhD thesis, University of Warwick.
WRAP_THESIS_Dyer_2011.pdf - Submitted Version
Download (13Mb) | Preview
Official URL: http://webcat.warwick.ac.uk/record=b2582571~S1
Many high throughput sequencing protocols for RNA and DNA require that the polynucleic acid is fragmented so that the identity of a limited number of nucleic acids of one or both of the ends of the fragments can be determined by sequencing. The nucleic acid sequence allows the fragment to be located within the genome, and the fragment distribution can then be used for a variety of different purposes. In the case of DNA this includes identifying the locations where specific proteins are bound to the genome. In the case of RNA this includes quantifying the expression levels of different gene variants or transcripts. If the locations of the polynucleic acid fragments are partly determined by the underlying nucleic acid sequence this could bias any results derived from the data. Unfortunately, such sequence dependencies have already been observed in the distribution of both RNA and DNA fragments. Previous analyses of such data in order to reduce the bias have examined the role of regional characteristics such as GC bias, or the bias towards a specific sequence at the start of the fragments. This thesis introduces a new method for modelling the bias which considers the degree to which the nucleotide sequence affects the likelihood of a fragment originating at that location. This shows that there is often not a single bias characteristic, but multiple, alternative sequence biases that coexist within a single dataset. This also shows that the nucleotide sequence immediately proximal to the fragment also has a significant effect on the fragment likelihood. This new approach highlights characteristics that were previously hidden and provides a more powerful basis for correcting such bias. Multiple alternative sequence biases are observed when both RNA and DNA are fragmented, but the more detailed information provided by the new technique shows in detail how the characteristics are different for RNA and DNA and indicates that very different molecular mechanisms are responsible for the biases in the two processes. This thesis also shows how removing the effect of this bias in ChIP-seq experiments can reveal more subtle features of the distribution of the fragments. This can provide information on the nature of the binding between proteins and the DNA with per-nucleotide precision, revealed through the change in likelihood of the DNA fragmenting at each position in the binding site. It is also shown how the model fitting technique developed to analyse sequence bias can also be used to obtain additional information from the results of ChIP-chip experiments. The approach is used to find the nucleotide sequence preference of DNA binding proteins, and also the cooperative effects associated with binding at multiple binding sites in close proximity.
|Item Type:||Thesis or Dissertation (PhD)|
|Subjects:||Q Science > QP Physiology|
|Library of Congress Subject Headings (LCSH):||Nucleotide sequence|
|Institution:||University of Warwick|
|Theses Department:||Molecular Organisation and Assembly in Cells|
|Supervisor(s)/Advisor:||Ott, Sascha ; Beynon, Jim, 1956-|
|Sponsors:||Engineering and Physical Sciences Research Council (EPSRC)|
|Extent:||xvi, 197 leaves : charts|
Actions (login required)