The Library
Theory meets practice at the median : a worst case comparison of relative error quantile algorithms
Tools
Cormode, Graham, Mishra, Abhinav, Ross, Joseph and Veselý, Pavel (2021) Theory meets practice at the median : a worst case comparison of relative error quantile algorithms. In: ACM SIGKDD Conference, Virtual Event, Singapore, 14–18 Aug 2021. Published in: KDD '21: Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining pp. 2722-2731. ISBN 978145038332. doi:10.1145/3447548.3467152
|
PDF
WRAP-Theory-meets-practice-median-worst-error-quantile-algorithms-2021.pdf - Accepted Version - Requires a PDF viewer. Download (1646Kb) | Preview |
Official URL: https://doi.org/10.1145/3447548.3467152
Abstract
Estimating the distribution and quantiles of data is a foundational task in data mining and data science. We study algorithms which provide accurate results for extreme quantile queries using a small amount of space, thus helping to understand the tails of the input distribution. Namely, we focus on two recent state-of-the-art solutions: t-digest and ReqSketch. While t-digest is a popular compact summary which works well in a variety of settings, ReqSketch comes with formal accuracy guarantees at the cost of its size growing as new observations are inserted. In this work, we provide insight into which conditions make one preferable to the other. Namely, we show how to construct inputs for t-digest that induce an almost arbitrarily large error and demonstrate that it fails to provide accurate results even on i.i.d. samples from a highly non-uniform distribution. We propose practical improvements to ReqSketch, making it faster than t-digest, while its error stays bounded on any instance. Still, our results confirm that t-digest remains more accurate on the "non-adversarial" data encountered in practice.
Item Type: | Conference Item (Paper) | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Subjects: | Q Science > QA Mathematics > QA76 Electronic computers. Computer science. Computer software | ||||||||||||
Divisions: | Faculty of Science, Engineering and Medicine > Science > Computer Science | ||||||||||||
Library of Congress Subject Headings (LCSH): | Data mining, Database management, Computer algorithms | ||||||||||||
Journal or Publication Title: | KDD '21: Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining | ||||||||||||
Publisher: | ACM | ||||||||||||
ISBN: | 978145038332 | ||||||||||||
Official Date: | 14 August 2021 | ||||||||||||
Dates: |
|
||||||||||||
Page Range: | pp. 2722-2731 | ||||||||||||
DOI: | 10.1145/3447548.3467152 | ||||||||||||
Status: | Peer Reviewed | ||||||||||||
Publication Status: | Published | ||||||||||||
Reuse Statement (publisher, data, author rights): | "© ACM, 2021. This is the author's version of the work. It is posted here by permission of ACM for your personal use. Not for redistribution. The definitive version was published in KDD '21: Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining, http://doi.acm.org/10.1145/3447548.3467152 | ||||||||||||
Access rights to Published version: | Restricted or Subscription Access | ||||||||||||
Date of first compliant deposit: | 24 June 2021 | ||||||||||||
Date of first compliant Open Access: | 24 June 2021 | ||||||||||||
RIOXX Funder/Project Grant: |
|
||||||||||||
Conference Paper Type: | Paper | ||||||||||||
Title of Event: | ACM SIGKDD Conference | ||||||||||||
Type of Event: | Conference | ||||||||||||
Location of Event: | Virtual Event, Singapore | ||||||||||||
Date(s) of Event: | 14–18 Aug 2021 | ||||||||||||
Related URLs: |
Request changes or add full text files to a record
Repository staff actions (login required)
View Item |
Downloads
Downloads per month over past year