The Library
Learning to accurately COUNT with query-driven predictive analytics
Tools
Anagnostopoulos, C. and Triantafillou, Peter (2015) Learning to accurately COUNT with query-driven predictive analytics. In: 2015 IEEE International Conference on Big Data (Big Data), Santa Clara, CA, USA, 29 Oct -1 Nov 2015. Published in: Proceedings - 2015 IEEE International Conference on Big Data, IEEE Big Data 2015 pp. 14-23. ISBN 9781479999255.
Research output not available from this repository.
Request-a-Copy directly from author or use local Library Get it For Me service.
Official URL: http://doi.org/10.1109/BigData.2015.7363736
Abstract
We study a novel solution to executing aggregation (and specifically COUNT) queries over large-scale data. The proposed solution is generally applicable, in the sense that it can be deployed in environments in which data owners may or may not restrict access to their data and allow only 'aggregation operators' to be executed over their data. For this, it is based on predictive analytics, driven by queries and their results. We propose a machine learning (ML) framework for the task (which can be adapted for different aggregates as well). We focus on the widely used set-cardinality (i.e., COUNT) aggregation operator, as it is a fundamental operator for both internal data system optimisations and for aggregation-query analytics. We contribute a novel, query-driven ML model whose goals are to: (i) learn the query space (access patterns), (ii) associate (complex) aggregation queries with the cardinality of their results, (iii) define query similarity and use it to predict the cardinality of the answer set of an ad-hoc incoming query. Our ML model incorporates incremental learning algorithms for ensuring high prediction accuracy even when both the querying patterns and the underlying data change. The significance of contribution lies in that it (i) is the only query-driven solution applicable over general environments which include restricted-access data, (ii) offers incremental learning adjusted for arriving ad-hoc queries, which is well suited for big data analytics, and (iii) offers a performance (in terms of prediction accuracy and time, and memory requirements) that is superior to data-centric approaches. We provide a comprehensive performance evaluation of our model, evaluating its sensitivity and comparative advantages versus acclaimed data-centric methods (self-tuning histograms, sampling, and multidimensional histograms). © 2015 IEEE.
Item Type: | Conference Item (Paper) | ||||
---|---|---|---|---|---|
Divisions: | Faculty of Science, Engineering and Medicine > Science > Computer Science | ||||
Journal or Publication Title: | Proceedings - 2015 IEEE International Conference on Big Data, IEEE Big Data 2015 | ||||
ISBN: | 9781479999255 | ||||
Official Date: | 2015 | ||||
Dates: |
|
||||
Page Range: | pp. 14-23 | ||||
Status: | Peer Reviewed | ||||
Publication Status: | Published | ||||
Reuse Statement (publisher, data, author rights): | cited By 1 | ||||
Access rights to Published version: | Restricted or Subscription Access | ||||
Conference Paper Type: | Paper | ||||
Title of Event: | 2015 IEEE International Conference on Big Data (Big Data) | ||||
Type of Event: | Conference | ||||
Location of Event: | Santa Clara, CA, USA | ||||
Date(s) of Event: | 29 Oct -1 Nov 2015 |
Request changes or add full text files to a record
Repository staff actions (login required)
View Item |