The Library
Detect, distill and update : learned DB systems facing out of distribution data
Tools
Kurmanji, M. and Triantafillou, Peter (2023) Detect, distill and update : learned DB systems facing out of distribution data. In: ACM SIGMOD Conference on the Management of Data, (SIGMOD23), Seattle, WA, 18-23 Jun 2023. Published in: Proceedings of the ACM on Management of Data, 1 (1). pp. 1-27. doi:10.1145/3588713
|
PDF
WRAP-Detect-distill-update-learned-DB-distribution-data-22.pdf - Accepted Version - Requires a PDF viewer. Download (1409Kb) | Preview |
Official URL: https://doi.org/10.1145/3588713
Abstract
Machine Learning (ML) is changing DBs as many DB components are being replaced by ML models. One open problem in this setting is how to update such ML models in the presence of data updates. We start this investigation focusing on data insertions (dominating updates in analytical DBs). We study how to update neural network (NN) models when new data follows a different distribution (a.k.a. it is "out-of-distribution" -- OOD), rendering previously-trained NNs inaccurate. A requirement in our problem setting is that learned DB components should ensure high accuracy for tasks on old and new data (e.g., for approximate query processing (AQP), cardinality estimation (CE), synthetic data generation (DG), etc.). This paper proposes a novel updatability framework (DDUp). DDUp can provide updatability for different learned DB system components, even based on different NNs, without the high costs to retrain the NNs from scratch. DDUp entails two components: First, a novel, efficient, and principled statistical-testing approach to detect OOD data. Second, a novel model updating approach, grounded on the principles of transfer learning with knowledge distillation, to update learned models efficiently, while still ensuring high accuracy. We develop and showcase DDUp's applicability for three different learned DB components, AQP, CE, and DG, each employing a different type of NN. Detailed experimental evaluation using real and benchmark datasets for AQP, CE, and DG detail DDUp's performance advantages.
Item Type: | Conference Item (Paper) | ||||||
---|---|---|---|---|---|---|---|
Alternative Title: | Learned DB systems facing out of distribution data | ||||||
Subjects: | Q Science > QA Mathematics > QA76 Electronic computers. Computer science. Computer software | ||||||
Divisions: | Faculty of Science, Engineering and Medicine > Science > Computer Science | ||||||
Library of Congress Subject Headings (LCSH): | Transfer learning (Machine learning), Neural networks (Computer science), Database management | ||||||
Journal or Publication Title: | Proceedings of the ACM on Management of Data | ||||||
Publisher: | ACM | ||||||
Official Date: | 30 May 2023 | ||||||
Dates: |
|
||||||
Volume: | 1 | ||||||
Number: | 1 | ||||||
Page Range: | pp. 1-27 | ||||||
Article Number: | 33 | ||||||
DOI: | 10.1145/3588713 | ||||||
Status: | Peer Reviewed | ||||||
Publication Status: | Published | ||||||
Re-use Statement: | © ACM, 2023. This is the author's version of the work. It is posted here by permission of ACM for your personal use. Not for redistribution. The definitive version was published in Proceedings of the ACM on Management of Data, 1(1) 2023. http://doi.acm.org/10.1145/3588713 | ||||||
Access rights to Published version: | Free Access (unspecified licence, 'bronze OA') | ||||||
Copyright Holders: | Copyright © 2023 ACM | ||||||
Date of first compliant deposit: | 19 December 2022 | ||||||
Date of first compliant Open Access: | 27 October 2023 | ||||||
Conference Paper Type: | Paper | ||||||
Title of Event: | ACM SIGMOD Conference on the Management of Data, (SIGMOD23) | ||||||
Type of Event: | Conference | ||||||
Location of Event: | Seattle, WA | ||||||
Date(s) of Event: | 18-23 Jun 2023 | ||||||
Related URLs: | |||||||
Open Access Version: |
Request changes or add full text files to a record
Repository staff actions (login required)
View Item |
Downloads
Downloads per month over past year