
The Library
Data sketching
Tools
Cormode, Graham (2017) Data sketching. Communications of the ACM, 60 (9). pp. 48-55. doi:10.1145/3080008 ISSN 0001-0782.
|
PDF
WRAP-Data-sketching-Cormode-2017.pdf - Accepted Version - Requires a PDF viewer. Download (496Kb) | Preview |
Official URL: http://dx.doi.org/10.1145/3080008
Abstract
Do you ever feel overwhelmed by a constant stream of information? It can seem like there is a barrage of new email and text messages arriving, phone calls, articles to read, and knocks on the door. Putting these pieces together to keep track of what’s important can be a real challenge.
The same information overload is of concern in many computational settings. For example, telecommunications companies want to keep track of the activity on their network, to identify the overall network health, and spot anomalies or changes in behavior. Yet, the scale of events occurring is huge: many millions of network events per hour, per network element. And while new technologies allow the scale and granularity of events being monitored to increase by orders
of magnitude, the capacity of computing elements to make sense of these (processors, memory and disks) is barely increasing. Even on a small scale, the amount of information may be too large
to store in an impoverished setting (say, an embedded device), or be too big to conveniently keep in fast storage.
In response to this challenge, the model of streaming data processing has grown in popularity. In this setting, the aim is no longer to capture, store and index every minute event, but rather to quickly process each observation to create some summary of the current state. Following its processing, an event is dropped, and is no longer accessible. The summary that is retained is often referred to as sketch of the data. Coping with the vast scale of information means making a number of compromises: the description of the world is approximate, rather than exact; the nature of queries to be answered must be decided in advance, rather than after the fact; and some questions are now insoluble. However, the ability to process vast quantities of data at blinding speeds with modest resources can more than make up for these limitations. As a consequence, streaming methods have been adopted in a number of domains, starting with telecommunications, but spreading to search engines, social networks, finance, and time-series analysis. These ideas are also finding application in areas where traditional approaches are applicable, but the rough and ready sketching approach is more cost-effective. Successful applications of sketching involve a mixture of algorithmic tricks, systems know-how, and mathematical insight, and have led to new research contributions in each of these areas.
In this article, we introduce the ideas behind, and applications, of sketching, with a focus on the algorithmic innovations. That is, we describe some algorithmic developments in the abstract, and then indicate the subsequent steps needed to put them into practice, with examples. We will see four novel algorithmic ideas, and discuss some emerging areas.
Item Type: | Journal Article | ||||||
---|---|---|---|---|---|---|---|
Subjects: | Q Science > QA Mathematics > QA76 Electronic computers. Computer science. Computer software | ||||||
Divisions: | Faculty of Science, Engineering and Medicine > Science > Computer Science | ||||||
Library of Congress Subject Headings (LCSH): | Big data, Data mining, Data flow computing, Computer algorithms | ||||||
Journal or Publication Title: | Communications of the ACM | ||||||
Publisher: | Association for Computing Machinery, Inc. | ||||||
ISSN: | 0001-0782 | ||||||
Official Date: | September 2017 | ||||||
Dates: |
|
||||||
Volume: | 60 | ||||||
Number: | 9 | ||||||
Page Range: | pp. 48-55 | ||||||
DOI: | 10.1145/3080008 | ||||||
Status: | Peer Reviewed | ||||||
Publication Status: | Published | ||||||
Access rights to Published version: | Restricted or Subscription Access | ||||||
Date of first compliant deposit: | 18 September 2017 | ||||||
Date of first compliant Open Access: | 19 September 2017 | ||||||
Funder: | European Research Council (ERC), Engineering and Physical Sciences Research Council (EPSRC), Royal Society (Great Britain). Wolfson Research Merit Award (RSWRMA) | ||||||
Grant number: | ERC-2014-CoG 647557 (ERC), EP/N510129/1 (EPSRC), EP/N012380/1 (EPSRC) |
Request changes or add full text files to a record
Repository staff actions (login required)
![]() |
View Item |
Downloads
Downloads per month over past year