Skip to content Skip to navigation
University of Warwick
  • Study
  • |
  • Research
  • |
  • Business
  • |
  • Alumni
  • |
  • News
  • |
  • About

University of Warwick
Publications service & WRAP

Highlight your research

  • WRAP
    • Home
    • Search WRAP
    • Browse by Warwick Author
    • Browse WRAP by Year
    • Browse WRAP by Subject
    • Browse WRAP by Department
    • Browse WRAP by Funder
    • Browse Theses by Department
  • Publications Service
    • Home
    • Search Publications Service
    • Browse by Warwick Author
    • Browse Publications service by Year
    • Browse Publications service by Subject
    • Browse Publications service by Department
    • Browse Publications service by Funder
  • Help & Advice
University of Warwick

The Library

  • Login
  • Admin

Sampling for big data

Tools
- Tools
+ Tools

Cormode, Graham and Duffield, Nick (2014) Sampling for big data. In: 20th ACM SIGKDD international conference on Knowledge discovery and data mining, New York, USA, 24-27 Aug 2014 p. 1975. ISBN 9781450329569. doi:10.1145/2623330.2630811

Research output not available from this repository, contact author.
Official URL: http://dx.doi.org/10.1145/2623330.2630811

Request Changes to record.

Abstract

One response to the proliferation of large datasets has been to develop ingenious ways to throw resources at the problem, using massive fault tolerant storage architectures, parallel and graphical computation models such as MapReduce, Pregel and Giraph. However, not all environments can support this scale of resources, and not all queries need an exact response. This motivates the use of sampling to generate summary datasets that support rapid queries, and prolong the useful life of the data in storage. To be effective, sampling must mediate the tensions between resource constraints, data characteristics, and the required query accuracy. The state-of-the-art in sampling goes far beyond simple uniform selection of elements, to maximize the usefulness of the resulting sample. This tutorial reviews progress in sample design for large datasets, including streaming and graph-structured data. Applications are discussed to sampling network traffic and social networks.

Item Type: Conference Item (Paper)
Divisions: Faculty of Science > Computer Science
Publisher: ACM New York
ISBN: 9781450329569
Book Title: Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining - KDD '14
Official Date: 24 August 2014
Dates:
DateEvent
24 August 2014Published
Page Range: p. 1975
DOI: 10.1145/2623330.2630811
Status: Peer Reviewed
Publication Status: Published
Access rights to Published version: Restricted or Subscription Access
Embodied As: 1
Conference Paper Type: Paper
Title of Event: 20th ACM SIGKDD international conference on Knowledge discovery and data mining
Type of Event: Conference
Location of Event: New York, USA
Date(s) of Event: 24-27 Aug 2014
Related URLs:
  • Organisation
  • Publisher
  • http://dl.acm.org/citation.cfm?id=263081...

Request changes or add full text files to a record

Repository staff actions (login required)

View Item View Item
twitter

Email us: wrap@warwick.ac.uk
Contact Details
About Us