Skip to content Skip to navigation
University of Warwick
  • Study
  • |
  • Research
  • |
  • Business
  • |
  • Alumni
  • |
  • News
  • |
  • About

University of Warwick
Publications service & WRAP

Highlight your research

  • WRAP
    • Home
    • Search WRAP
    • Browse by Warwick Author
    • Browse WRAP by Year
    • Browse WRAP by Subject
    • Browse WRAP by Department
    • Browse WRAP by Funder
    • Browse Theses by Department
  • Publications Service
    • Home
    • Search Publications Service
    • Browse by Warwick Author
    • Browse Publications service by Year
    • Browse Publications service by Subject
    • Browse Publications service by Department
    • Browse Publications service by Funder
  • Help & Advice
University of Warwick

The Library

  • Login
  • Admin

Computational prediction of functional similarity of CRMs

Tools
- Tools
+ Tools

Koohy, Hashem (2010) Computational prediction of functional similarity of CRMs. PhD thesis, University of Warwick.

[img]
Preview
PDF
WRAP_Thesis_Koohy_2010.pdf - Submitted Version - Requires a PDF viewer.

Download (14Mb) | Preview
Official URL: http://webcat.warwick.ac.uk/record=b2717156~S1

Request Changes to record.

Abstract

Transcriptional regulation of genes is fundamental to all living organisms. The spatial, temporal and condition-specific expression levels of genes are in part determined by inherited regulatory codes in non-coding regions of the DNA. A large set of methods have been proposed to detect conserved regions of regulatory DNA by means of sequence alignments. However, it has become clear that some regulatory regions do not show statistically significant alignments even in the presence of functional conservation. Therefore, detecting and characterising elusive regulatory codes remains a challenging problem.
In this thesis we develop and validate a novel computational alignment free model for detection of functional similarity of regulatory sequences. We show that our model can detect functional links between pairs of sequences that do not align with a significant score. We apply the model to a) detect enhancers within the same genome that are likely to have similar functions and b) to detect functionally conserved enhancer regions in orthologous genomes. Our method finds regulatory codes that are common to groups of similar enhancers and consistent with previous biological knowledge.
The inputs for our model are two sequences that we wish to compare in terms of their functional similarity as well as a set of transcription factor motifs. The mathematical framework of our model is built on two main components: In the first model component, each sequence is mapped to a vector of estimated occupancy levels for all motifs. These vectors are representing which motifs at what multiplicity and specificity are present in each sequence.
In the second model component, a statistical approach is established where we first estimate a probability distribution of motif occupancy levels for sequences that function similar to the template sequence. We then compute a statistical similarity score to evaluate if the sequences are more similar to each other than to random background sequences.
Two applications of this model are presented: First it is applied to a set of experimentally validated non-alignable enhancers from
D. melanogaster. We show that:
• Our model can detect statistical links between these enhancers,
• Weak binding sites can make a strong contribution to sequence similarity,
• Our model treats statistically significant presence and absence of motifs symmetrically. Similarity of sequences, therefore, can be based on a combination of the two. We show examples of motifs making contributions to sequence similarity through their absence.
• Using our model, we can create a network of similarities among the fly enhancers. Groups of enhancers in this network show common
regulatory codes. One of these regulatory codes is strongly supported by existing experimental data.
In the second application of our model we predict functional subregions of a known D. melanogaster enhancer. To achieve this, we first show that the model can detect the orthology of this enhancer between 10 Drosophila species. We then demonstrate how this statistical link can be used to predict functional subregions within this enhancer.

Item Type: Thesis (PhD)
Subjects: Q Science > QH Natural history > QH426 Genetics
Library of Congress Subject Headings (LCSH): Genetic regulation, Nucleotide sequence -- Mathematical models
Official Date: October 2010
Dates:
DateEvent
October 2010Submitted
Institution: University of Warwick
Theses Department: Systems Biology Doctoral Training Centre
Thesis Type: PhD
Publication Status: Unpublished
Supervisor(s)/Advisor: Ott, Sascha ; Koentges, Georgy
Sponsors: Warwick Systems Biology Centre ; Human Frontier Science Program
Extent: xii, 127 leaves : ill., charts
Language: eng

Request changes or add full text files to a record

Repository staff actions (login required)

View Item View Item

Downloads

Downloads per month over past year

View more statistics

twitter

Email us: wrap@warwick.ac.uk
Contact Details
About Us