![University of Warwick](/images/warwick_uni_black.gif)
The Library
Learned joins with probabilistic graphical models
Tools
Shanghooshabad, A. M. (2022) Learned joins with probabilistic graphical models. PhD thesis, University of Warwick.
|
PDF
WRAP_Mohammadi_Shanghooshabad_2022.pdf - Submitted Version - Requires a PDF viewer. Download (3655Kb) | Preview |
Official URL: http://webcat.warwick.ac.uk/record=b3912241~S15
Abstract
Because of the normalization in relational databases, joins are ever-present in analytical environments. However, joining multiple large tables raises critical challenges for modern databases as the operation is both time- and resource-consuming.
In this work, we investigate the use of Machine Learning (ML) and statistical models to reduce join costs at multiple levels. More specifically, we look at how learning the distribution of the join result might help us deal with the costs of joins. The distribution should be learned without generating the join result; otherwise it defeats the point. To do so, we map the join problem into Probabilistic Graphical Models (PGMs) in order to derive the factorized distribution of the join result. This is accomplished simply by scanning the tables once. Then, using the factorized distribution and tweaked versions of PGMs’ algorithms that are efficient, principled, and easy-to-understand, we propose three different PGM-based solutions to three different problems with joins.
First, we present the PGM-Join sampler, a PGM-based algorithm for generating a uniform and independent sample of the join result without producing the join result, and we demonstrate that our solution is up to 28 times faster than the state-of-the-art join sampling methods.
Second, we propose Graphical Join (GJ), a new physical join algorithm. GJ generates a frequency-based summary of the join result, which is subsequently de-summarized to retrieve the exact join result (after optionally storing and loading the summary from disk). GJ significantly reduces the time and storage cost of joins.
Furthermore, while doing our research on joins, we found out that a new challenge in ML-driven modern relational databases is emerging, called the model join problem. For several reasons, sometimes the tables may not be accessible, but models of the tables are. So, how can one join the models rather than the tables? We define the model join problem for the first time and suggest the first solution for that which is PGM-based.
Item Type: | Thesis (PhD) | ||||
---|---|---|---|---|---|
Subjects: | Q Science > QA Mathematics > QA76 Electronic computers. Computer science. Computer software | ||||
Library of Congress Subject Headings (LCSH): | Machine learning, Algorithms, Graphical modeling (Statistics), Probabilities, Bayesian statistical decision theory -- Graphic methods | ||||
Official Date: | October 2022 | ||||
Dates: |
|
||||
Institution: | University of Warwick | ||||
Theses Department: | Department of Computer Science | ||||
Thesis Type: | PhD | ||||
Publication Status: | Unpublished | ||||
Supervisor(s)/Advisor: | Triantafillou, Peter | ||||
Sponsors: | Engineering and Physical Sciences Research Council | ||||
Extent: | xi, 127 pages : illustrations, charts. | ||||
Language: | eng |
Request changes or add full text files to a record
Repository staff actions (login required)
![]() |
View Item |