Skip to content Skip to navigation
University of Warwick
  • Study
  • |
  • Research
  • |
  • Business
  • |
  • Alumni
  • |
  • News
  • |
  • About

University of Warwick
Publications service & WRAP

Highlight your research

  • WRAP
    • Home
    • Search WRAP
    • Browse by Warwick Author
    • Browse WRAP by Year
    • Browse WRAP by Subject
    • Browse WRAP by Department
    • Browse WRAP by Funder
    • Browse Theses by Department
  • Publications Service
    • Home
    • Search Publications Service
    • Browse by Warwick Author
    • Browse Publications service by Year
    • Browse Publications service by Subject
    • Browse Publications service by Department
    • Browse Publications service by Funder
  • Help & Advice
University of Warwick

The Library

  • Login
  • Admin

Sequence distance embeddings

Tools
- Tools
+ Tools

Cormode, Graham (2003) Sequence distance embeddings. PhD thesis, University of Warwick.

[img]
Preview
PDF
WRAP_THESIS_Cormode_2003.pdf - Submitted Version - Requires a PDF viewer.

Download (1780Kb) | Preview
Official URL: http://webcat.warwick.ac.uk/record=b1663364~S1

Request Changes to record.

Abstract

Sequences represent a large class of fundamental objects in Computer Science sets, strings, vectors and permutations are considered to be sequences. Distances between sequences measure their similarity, and computations based on distances are ubiquitous: either to compute the distance, or to use distance computation as part of a more complex problem. This thesis takes a very specific approach to solving questions of sequence distance: sequences are embedded into other distance measures, so that distance in the new space approximates the original distance. This allows the solution of a variety of problems including: Fast computation of short sketches in a variety of computing models, which allow sequences to be compared in constant time and space irrespective of the size of the original sequences. Approximate nearest neighbor and clustering problems, significantly faster than the naive exact solutions. Algorithms to find approximate occurrences of pattern sequences in long text sequences in near linear time. Efficient communication schemes to approximate the distance between, and exchange, sequences in close to the optimal amount of communication. </li> </ul> Solutions are given for these problems for a variety of distances, including fundamental distances on sets and vectors; distances inspired by biological problems for permutations; and certain text editing distances for strings. Many of these embeddings are computable in a streaming model where the data is too large to store in memory, and instead has to be processed as and when it arrives, piece by piece. The embeddings are also shown to be practical, with a series of large scale experiments which demonstrate that given only a small space, approximate solutions to several similarity and clustering problems can be found that are as good as or better than those found with prior methods.

Item Type: Thesis or Dissertation (PhD)
Subjects: Q Science > QA Mathematics > QA76 Electronic computers. Computer science. Computer software
Divisions: Faculty of Science > Computer Science
Library of Congress Subject Headings (LCSH): Sequences (Mathematics)
Publisher: Department of Computer Science
Place of Publication: Coventry, UK
Official Date: January 2003
Dates:
DateEvent
January 2003Submitted
DOI: CS-RR-393
Institution: University of Warwick
Theses Department: Department of Computer Science
Thesis Type: PhD
Publication Status: Published
Supervisor(s)/Advisor: Paterson, Michael S.
Extent: xvii, 275 p.
Language: eng

Request changes or add full text files to a record

Repository staff actions (login required)

View Item View Item

Downloads

Downloads per month over past year

View more statistics

twitter

Email us: wrap@warwick.ac.uk
Contact Details
About Us