Variable structure motifs for transcription factor binding sites
Reid, J. E. (John E.), Evans, Kenneth J., Dyer, Nigel, Wernisch, Lorenz and Ott, Sascha. (2010) Variable structure motifs for transcription factor binding sites. BMC Genomics, Vol.11 (Article 30). ISSN 1471-2164
WRAP_Dyer_variable_Structure.pdf - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
Official URL: http://dx.doi.org/10.1186/1471-2164-11-30
Background: Classically, models of DNA-transcription factor binding sites (TFBSs) have been based on relatively few known instances and have treated them as sites of fixed length using position weight matrices (PWMs). Various extensions to this model have been proposed, most of which take account of dependencies between the bases in the binding sites. However, some transcription factors are known to exhibit some flexibility and bind to DNA in more than one possible physical configuration. In some cases this variation is known to affect the function of binding sites. With the increasing volume of ChIP-seq data available it is now possible to investigate models that incorporate this flexibility. Previous work on variable length models has been constrained by: a focus on specific zinc finger proteins in yeast using restrictive models; a reliance on hand-crafted models for just one transcription factor at a time; and a lack of evaluation on realistically sized data sets.
Results: We re-analysed binding sites from the TRANSFAC database and found motivating examples where our new variable length model provides a better fit. We analysed several ChIP-seq data sets with a novel motif search algorithm and compared the results to one of the best standard PWM finders and a recently developed alternative method for finding motifs of variable structure. All the methods performed comparably in held-out cross validation tests. Known motifs of variable structure were recovered for p53, Stat5a and Stat5b. In addition our method recovered a novel generalised version of an existing PWM for Sp1 that allows for variable length binding. This motif improved classification performance.
Conclusions: We have presented a new gapped PWM model for variable length DNA binding sites that is not too restrictive nor over-parameterised. Our comparison with existing tools shows that on average it does not have better predictive accuracy than existing methods. However, it does provide more interpretable models of motifs of variable structure that are suitable for follow-up structural studies. To our knowledge, we are the first to apply variable length motif models to eukaryotic ChIP-seq data sets and consequently the first to show their value in this domain. The results include a novel motif for the ubiquitous transcription factor Sp1.
|Item Type:||Journal Article|
|Subjects:||Q Science > QH Natural history > QH426 Genetics|
|Divisions:||Faculty of Science > Molecular Organisation and Assembly in Cells (MOAC)
Faculty of Science > Centre for Systems Biology
|Library of Congress Subject Headings (LCSH):||DNA-binding proteins -- Research, Genetic transcription, Transcription factors, Genomics, Genetics -- Mathematical models|
|Journal or Publication Title:||BMC Genomics|
|Publisher:||BioMed Central Ltd.|
|Official Date:||14 January 2010|
|Access rights to Published version:||Open Access|
|Funder:||Engineering and Physical Sciences Research Council (EPSRC), Research Councils UK (RCUK)|
1. Loh YH, Wu Q, Chew JL, Vega VB, Zhang W, Chen X, Bourque G, George J,
Actions (login required)