
The Library
PGMJoins : random join sampling with graphical models
Tools
Shanghooshabad, A. M., Kurmanji, M., Ma, Q., Shekelyan, Michael, Almasi, Mehrdad and Triantafillou, Peter (2021) PGMJoins : random join sampling with graphical models. In: ACM Sigmod Conference on the Management of Data, Virtual, 20-25 Jun 2021. Published in: SIGMOD/PODS '21: Proceedings of the 2021 International Conference on Management of Data pp. 1610-1622. ISBN 9781450383431. doi:10.1145/3448016.3457302
Research output not available from this repository.
Request-a-Copy directly from author or use local Library Get it For Me service.
Official URL: https://doi.org/10.1145/3448016.3457302
Abstract
Modern databases face formidable challenges when called to join (several) massive tables. Joins (especially when entailing many-to-many joins) are very time- and resource-consuming, join results can be too big to keep in memory, and performing analytics/learning tasks over them costs dearly in terms of time, resources, and money (in the cloud). Moreover, although random sampling is a promising idea to mitigate the above problems, the current state of the art leaves lots of room for improvements. With this paper we contribute a principled solution, coined PGMJoins. PGMJoins adapts Probabilistic Graphical Models to deriving provably random samples of the join result for (n-way) key joins, many-to-many joins, and cyclic and acyclic joins. PGMJoins contributes optimizations both for deriving the structure of the graph and for PGM inference. It also contributes a novel Sum-Product Message Passing Algorithm (SP-MPA) to make a uniform sample of the joint distribution (join result) efficiently and a novel way to deal with cyclic joins. Despite the use of PGMs, the learned joint distribution is not approximated, and the uniform samples are drawn from the true distribution. Our experimentation using queries and datasets from TPC-H, JOB, TPC-DS, and Twitter shows PGMJoins to outperform the state of the art (by 2X-28X).
Item Type: | Conference Item (Paper) | ||||||
---|---|---|---|---|---|---|---|
Divisions: | Faculty of Science, Engineering and Medicine > Science > Computer Science | ||||||
Journal or Publication Title: | SIGMOD/PODS '21: Proceedings of the 2021 International Conference on Management of Data | ||||||
Publisher: | ACM | ||||||
ISBN: | 9781450383431 | ||||||
Official Date: | 9 June 2021 | ||||||
Dates: |
|
||||||
Page Range: | pp. 1610-1622 | ||||||
DOI: | 10.1145/3448016.3457302 | ||||||
Status: | Peer Reviewed | ||||||
Publication Status: | Published | ||||||
Access rights to Published version: | Restricted or Subscription Access | ||||||
Is Part Of: | 1 | ||||||
Conference Paper Type: | Paper | ||||||
Title of Event: | ACM Sigmod Conference on the Management of Data | ||||||
Type of Event: | Conference | ||||||
Location of Event: | Virtual | ||||||
Date(s) of Event: | 20-25 Jun 2021 | ||||||
Related URLs: |
Request changes or add full text files to a record
Repository staff actions (login required)
![]() |
View Item |