The Library
Finding structure in data streams : correlations, independent sets, and matchings
Tools
Dark, Jacques (2020) Finding structure in data streams : correlations, independent sets, and matchings. PhD thesis, University of Warwick.
|
PDF
WRAP_Theses_Dark_2020.pdf - Submitted Version - Requires a PDF viewer. Download (1142Kb) | Preview |
Official URL: http://webcat.warwick.ac.uk/record=b3518287~S15
Abstract
The streaming model supposes that, rather than being available all at once, the data is received in a piecemeal fashion. In a world of massive data sets, streaming algorithms give a complementary approach to distributed algorithms: with the data all being
available in one place but at different times, rather than at the same time in different places.
We examine three different single-pass streaming problems where existing results show limited feasibility. We consider realistic relaxations or restrictions of these problems which allow for more efficient algorithms.
In the correlation outliers problem, we wish to identify pairs of unusually correlated signals from a streamed matrix of observations. We show that a simple application of existing technique is space-optimal but has slow query time when the outlier threshold
is small. We demonstrate how we can achieve faster query times at the cost of storing a larger data summary.
In the maximum independent set problem, we wish to find an edge-less induced subgraph of maximum size. For arbitrary graphs, given as a stream of edges, it is known that no space-efficient algorithm exists. We consider a variant streaming model, where
the graph is received vertex by vertex. While we show this model still does not admit efficient algorithms for general graphs, we demonstrate efficient approximation algorithms for various special graph classes.
In the maximum matching problem, we wish to find a disjoint subset of edges of largest possible size. The greedy algorithm gives us an easy 2-approximation for streams of edges, but the problem becomes infeasible to solve if we allow unlimited edge deletions. We consider a model where, instead, a limited number of deletions are allowed. We describe several new approximation algorithms with complexity
parameterised by the number of deletions. We also present new techniques which may lead to the development of corresponding tight lower bounds.
Item Type: | Thesis (PhD) | ||||
---|---|---|---|---|---|
Subjects: | Q Science > QA Mathematics T Technology > TK Electrical engineering. Electronics Nuclear engineering |
||||
Library of Congress Subject Headings (LCSH): | Data structures (Computer science), Streaming technology (Telecommunications), Computer networks, Computer algorithms, Statistical matching | ||||
Official Date: | January 2020 | ||||
Dates: |
|
||||
Institution: | University of Warwick | ||||
Theses Department: | Department of Computer Science | ||||
Thesis Type: | PhD | ||||
Publication Status: | Unpublished | ||||
Supervisor(s)/Advisor: | Cormode, Graham | ||||
Sponsors: | Microsoft Corporation ; Alan Turing Institute ; Engineering and Physical Sciences Research Council | ||||
Extent: | v, 129 leaves | ||||
Language: | eng |
Request changes or add full text files to a record
Repository staff actions (login required)
View Item |
Downloads
Downloads per month over past year