The Library
Approximate query processing using machine learning
Tools
Ma, Qingzhi (2021) Approximate query processing using machine learning. PhD thesis, University of Warwick.
|
PDF
WRAP_Theses_Ma_2021.pdf - Submitted Version - Requires a PDF viewer. Download (3866Kb) | Preview |
Official URL: http://webcat.warwick.ac.uk/record=b3733307
Abstract
In the era of big data, the volume of collected data grows faster than the growth of computational power. And it becomes prohibitively expensive to compute the exact answers to analytical queries. This greatly increases the value of approaches that can compute efficiently approximate, but highly accurate, answers to analytical queries. Approximate query processing (AQP) aims to reduce the query latency and memory footprints at the cost of small quality losses. Previous efforts on AQP largely rely on samples or sketches, etc. However, trade-offs between query response time (or memory footprint) and accuracy are unavoidable. Specifically, to guarantee higher accuracy, a large sample is usually generated and maintained, which leads to increased query response time and space overheads.
In this thesis, we aim to overcome the drawbacks of current AQP solutions by applying machine learning models. Instead of accessing data (or samples of it), models are used to make predictions. Our model-based AQP solutions are developed and improved in three stages, and are described as follows:
1. We firstly investigate potential regression models for AQP and propose the query-centric regression, coined QReg. QReg is an ensemble method based on regression models. It achieves better accuracy than the state-of- the-art regression models and overcomes the generalization-overfit dilemma when employing machine learning models within DBMSs.
2. We introduce the first AQP engine DBEst based on classical machine learning models. Specifically, regression models and density estimators are trained over the data/samples, and are further combined to produce the final approximate answers.
3. We further improve DBEst by replacing classical machine learning models with deep learning networks and word embedding. This overcomes the drawbacks of queries with large groups, and query response time and space overheads are further reduced.
We conduct experiments against the state-of-the-art AQP engines over various datasets, and show that our method achieves better accuracy while offering orders of magnitude savings in space overheads and query response time.
Item Type: | Thesis (PhD) | ||||
---|---|---|---|---|---|
Subjects: | Q Science > QA Mathematics > QA76 Electronic computers. Computer science. Computer software | ||||
Library of Congress Subject Headings (LCSH): | Querying (Computer science), Big data, Machine learning, Regression analysis -- Data processing, Database management | ||||
Official Date: | June 2021 | ||||
Dates: |
|
||||
Institution: | University of Warwick | ||||
Theses Department: | Department of Computer Science | ||||
Thesis Type: | PhD | ||||
Publication Status: | Unpublished | ||||
Supervisor(s)/Advisor: | Triantafillou, Peter, 1963- | ||||
Format of File: | |||||
Extent: | xviii, 150 leaves : illustrations | ||||
Language: | eng |
Request changes or add full text files to a record
Repository staff actions (login required)
View Item |
Downloads
Downloads per month over past year