The Library
Data-independent space partitionings for summaries
Tools
Cormode, Graham, Garofalakis, Minos and Shekelyan, Michael (2021) Data-independent space partitionings for summaries. In: The 2021 ACM SIGMOD/PODS Conference, Virtual conference, 20-25 Jun 2021. Published in: PODS'21: Proceedings of the 40th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems pp. 285-298. ISBN 9781450383813. doi:10.1145/3452021.3458316
|
PDF
WRAP-Data-independent-space-partitionings-summaries-2021.pdf - Accepted Version - Requires a PDF viewer. Download (1278Kb) | Preview |
Official URL: https://doi.org/10.1145/3452021.3458316
Abstract
Histograms are a standard tool in data management for describing multidimensional data. It is often convenient or even necessary to define data independent histograms, to partition space in advance without observing the data itself. Specific motivations arise in managing data when it is not suitable to frequently change the boundaries between histogram cells. For example, when the data is subject to many insertions and deletions; when data is distributed across multiple systems; or when producing a privacy-preserving representation of the data. The baseline approach is to consider an equiwidth histogram, i.e., a regular grid over the space. However, this is not optimal for the objective of splitting the multidimensional space into (possibly overlapping) bins, such that each box can be rebuilt using a set of non-overlapping bins with minimal excess (or deficit) of volume. Thus, we investigate how to split the space into bins and identify novel solutions that offer a good balance of desirable properties. As many data processing tools require a dataset as an input, we propose efficient methods how to obtain synthetic point sets that match the histograms over the overlapping bins.
Item Type: | Conference Item (Paper) | ||||||||
---|---|---|---|---|---|---|---|---|---|
Subjects: | Q Science > QA Mathematics > QA76 Electronic computers. Computer science. Computer software | ||||||||
Divisions: | Faculty of Science, Engineering and Medicine > Science > Computer Science | ||||||||
Library of Congress Subject Headings (LCSH): | Database management, Data mining, Querying (Computer science), Data structures (Computer science) | ||||||||
Journal or Publication Title: | PODS'21: Proceedings of the 40th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems | ||||||||
Publisher: | ACM | ||||||||
ISBN: | 9781450383813 | ||||||||
Official Date: | 20 June 2021 | ||||||||
Dates: |
|
||||||||
Page Range: | pp. 285-298 | ||||||||
DOI: | 10.1145/3452021.3458316 | ||||||||
Status: | Peer Reviewed | ||||||||
Publication Status: | Published | ||||||||
Access rights to Published version: | Open Access (Creative Commons) | ||||||||
Date of first compliant deposit: | 6 May 2021 | ||||||||
Date of first compliant Open Access: | 1 September 2021 | ||||||||
RIOXX Funder/Project Grant: |
|
||||||||
Conference Paper Type: | Paper | ||||||||
Title of Event: | The 2021 ACM SIGMOD/PODS Conference | ||||||||
Type of Event: | Conference | ||||||||
Location of Event: | Virtual conference | ||||||||
Date(s) of Event: | 20-25 Jun 2021 | ||||||||
Related URLs: |
Request changes or add full text files to a record
Repository staff actions (login required)
View Item |
Downloads
Downloads per month over past year