Skip to content Skip to navigation
University of Warwick
  • Study
  • |
  • Research
  • |
  • Business
  • |
  • Alumni
  • |
  • News
  • |
  • About

University of Warwick
Publications service & WRAP

Highlight your research

  • WRAP
    • Home
    • Search WRAP
    • Browse by Warwick Author
    • Browse WRAP by Year
    • Browse WRAP by Subject
    • Browse WRAP by Department
    • Browse WRAP by Funder
    • Browse Theses by Department
  • Publications Service
    • Home
    • Search Publications Service
    • Browse by Warwick Author
    • Browse Publications service by Year
    • Browse Publications service by Subject
    • Browse Publications service by Department
    • Browse Publications service by Funder
  • Help & Advice
University of Warwick

The Library

  • Login
  • Admin

Continuous sampling from distributed streams

Tools
- Tools
+ Tools

Cormode, Graham, Muthukrishnan, S., Yi, Ke and Zhang, Qin (2012) Continuous sampling from distributed streams. Journal of the ACM, 59 (2). Article number 10. doi:10.1145/2160158.2160163

Research output not available from this repository, contact author.
Official URL: http://dx.doi.org/10.1145/2160158.2160163

Request Changes to record.

Abstract

A fundamental problem in data management is to draw and maintain a sample of a large data set, for approximate query answering, selectivity estimation, and query planning. With large, streaming data sets, this problem becomes particularly difficult when the data is shared across multiple distributed sites. The main challenge is to ensure that a sample is drawn uniformly across the union of the data while minimizing the communication needed to run the protocol on the evolving data. At the same time, it is also necessary to make the protocol lightweight, by keeping the space and time costs low for each participant. In this article, we present communication-efficient protocols for continuously maintaining a sample (both with and without replacement) from k distributed streams. These apply to the case when we want a sample from the full streams, and to the sliding window cases of only the W most recent elements, or arrivals within the last w time units. We show that our protocols are optimal (up to logarithmic factors), not just in terms of the communication used, but also the time and space costs for each participant.

Item Type: Journal Article
Subjects: Q Science > QA Mathematics
Divisions: Faculty of Science, Engineering and Medicine > Science > Computer Science
Library of Congress Subject Headings (LCSH): Sampling (Statistics)
Journal or Publication Title: Journal of the ACM
Publisher: ACM Special Interest Group
ISSN: 0004-5411
Official Date: April 2012
Dates:
DateEvent
April 2012Published
Volume: 59
Number: 2
Number of Pages: 25
Article Number: Article number 10
DOI: 10.1145/2160158.2160163
Status: Peer Reviewed
Publication Status: Published
Access rights to Published version: Restricted or Subscription Access
Funder: Hong Kong University of Science and Technology‏ (HKUST), Google (Firm), National Science Foundation (U.S.) (NSF)
Grant number: 0414852 (NSF)

Request changes or add full text files to a record

Repository staff actions (login required)

View Item View Item
twitter

Email us: wrap@warwick.ac.uk
Contact Details
About Us