The Library

An analysis of single amino acid repeats as use case for application specific background models

Tools

Łabaj, Paweł P, Sykacek, Peter and Kreil, David (2011) An analysis of single amino acid repeats as use case for application specific background models. BMC Bioinformatics, Vol.12 (No.1). p. 173. doi:10.1186/1471-2105-12-173 ISSN 1471-2105.

Research output not available from this repository.

Request-a-Copy directly from author or use local Library Get it For Me service.

Official URL: http://dx.doi.org/10.1186/1471-2105-12-173

Request Changes to record.

Abstract

Background
Sequence analysis aims to identify biologically relevant signals against a backdrop of functionally meaningless variation. Increasingly, it is recognized that the quality of the background model directly affects the performance of analyses. State-of-the-art approaches rely on classical sequence models that are adapted to the studied dataset. Although performing well in the analysis of globular protein domains, these models break down in regions of stronger compositional bias or low complexity. While these regions are typically filtered, there is increasing anecdotal evidence of functional roles. This motivates an exploration of more complex sequence models and application-specific approaches for the investigation of biased regions.

Results
Traditional Markov-chains and application-specific regression models are compared using the example of predicting runs of single amino acids, a particularly simple class of biased regions. Cross-fold validation experiments reveal that the alternative regression models capture the multi-variate trends well, despite their low dimensionality and in contrast even to higher-order Markov-predictors. We show how the significance of unusual observations can be computed for such empirical models. The power of a dedicated model in the detection of biologically interesting signals is then demonstrated in an analysis identifying the unexpected enrichment of contiguous leucine-repeats in signal-peptides. Considering different reference sets, we show how the question examined actually defines what constitutes the 'background'. Results can thus be highly sensitive to the choice of appropriate model training sets. Conversely, the choice of reference data determines the questions that can be investigated in an analysis.

Conclusions
Using a specific case of studying biased regions as an example, we have demonstrated that the construction of application-specific background models is both necessary and feasible in a challenging sequence analysis situation.

Item Type:

Journal Article

Divisions:

Faculty of Science, Engineering and Medicine > Science > Life Sciences (2010- )

Journal or Publication Title:

BMC Bioinformatics

Publisher:

BioMed Central Ltd.

ISSN:

1471-2105

Official Date:

19 May 2011

Dates:

Date	Event
19 May 2011	Published

Volume:

Vol.12

Number:

No.1

Page Range:

p. 173

DOI:

10.1186/1471-2105-12-173

Status:

Peer Reviewed

Publication Status:

Published

Access rights to Published version:

Restricted or Subscription Access

Request changes or add full text files to a record

Repository staff actions (login required)

View Item

University of Warwick
Publications service & WRAP

Highlight your research

The Library

An analysis of single amino acid repeats as use case for application specific background models

Abstract

Repository staff actions (login required)

University of WarwickPublications service & WRAP

Highlight your research

The Library

An analysis of single amino acid repeats as use case for application specific background models

Abstract

Repository staff actions (login required)

University of Warwick
Publications service & WRAP