The Library
Large-scale data exploration using explanatory regression functions
Tools
Savva, Fotis, Anagnostopoulos, Christos, Triantafillou, Peter and Kolomvatsos, Kostas (2020) Large-scale data exploration using explanatory regression functions. ACM Transactions on Knowledge Discovery from Data (TKDD), 14 (6). pp. 1-33. 76. doi:10.1145/3410448 ISSN 1556-4681.
Research output not available from this repository.
Request-a-Copy directly from author or use local Library Get it For Me service.
Official URL: http://dx.doi.org/10.1145/3410448
Abstract
Analysts wishing to explore multivariate data spaces, typically issue queries involving selection operators, i.e., range or equality predicates, which define data subspaces of potential interest. Then, they use aggregation functions, the results of which determine a subspace’s interestingness for further exploration and deeper analysis. However, Aggregate Query (AQ) results are scalars and convey limited information and explainability about the queried subspaces for enhanced exploratory analysis. Analysts have no way of identifying how these results are derived or how they change w.r.t query (input) parameter values. We address this shortcoming by aiding analysts to explore and understand data subspaces by contributing a novel explanation mechanism based on machine learning. We explain AQ results using functions obtained by a three-fold joint optimization problem which assume the form of explainable piecewise-linear regression functions. A key feature of the proposed solution is that the explanation functions are estimated using past executed queries. These queries provide a coarse grained overview of the underlying aggregate function (generating the AQ results) to be learned. Explanations for future, previously unseen AQs can be computed without accessing the underlying data and can be used to further explore the queried data subspaces, without issuing more queries to the backend analytics engine. We evaluate the explanation accuracy and efficiency through theoretically grounded metrics over real-world and synthetic datasets and query workloads.
Item Type: | Journal Article | ||||||
---|---|---|---|---|---|---|---|
Divisions: | Faculty of Science, Engineering and Medicine > Science > Computer Science | ||||||
Journal or Publication Title: | ACM Transactions on Knowledge Discovery from Data (TKDD) | ||||||
Publisher: | ACM | ||||||
ISSN: | 1556-4681 | ||||||
Official Date: | September 2020 | ||||||
Dates: |
|
||||||
Volume: | 14 | ||||||
Number: | 6 | ||||||
Page Range: | pp. 1-33 | ||||||
Article Number: | 76 | ||||||
DOI: | 10.1145/3410448 | ||||||
Status: | Peer Reviewed | ||||||
Publication Status: | Published | ||||||
Access rights to Published version: | Restricted or Subscription Access |
Request changes or add full text files to a record
Repository staff actions (login required)
View Item |