The Library
Revisiting exact kNN query processing with probabilistic data space transformations
Tools
Cahsai, Atoshum, Anagnostopoulos, Christos, Ntarmos, Nikos and Triantafillou, Peter (2018) Revisiting exact kNN query processing with probabilistic data space transformations. In: 2018 IEEE International Conference on Big Data (Big Data), Seattle, USA, 10-14 Dec 2018. Published in: 2018 IEEE International Conference on Big Data (Big Data)
|
PDF
WRAP-revisiting-exact-kNN-query-processing-probabilistic-data-space-transformations-Triantafillou-2018.pdf - Accepted Version - Requires a PDF viewer. Download (1834Kb) | Preview |
Official URL: http://cci.drexel.edu/bigdata/bigdata2018/Accepted...
Abstract
The state-of-the-art approaches for scalable kNN query processing utilise big data parallel/distributed platforms (e.g., Hadoop and Spark) and storage engines (e.g, HDFS, NoSQL, etc.), upon which they build (tree based) indexing methods for effi- cient query processing. However, as data sizes continue to increase (nowadays it is not uncommon to reach several Petabytes), the storage cost of tree-based index structures becomes exceptionally high. In this work, we propose a novel perspective to organise multivariate (mv) datasets. The main novel idea relies on data space probabilistic transformations and derives a Space Transfor- mation Organisation Structure (STOS) for mv data organisation. STOS facilitates query processing as if underlying datasets were uniformly distributed. This approach bears significant advan- tages. First, STOS enjoys a minute memory footprint that is many orders of magnitude smaller than indexes in related work. Second, the required memory, unlike related work, increases very slowly with dataset size and, thus, enjoys significantly higher scalability. Third, the STOS structure is relatively efficient to compute, outperforming traditional index building times. The new approach comes bundled with a distributed coordinator- based query processing method so that, overall, lower query processing times are achieved compared to the state-of-the-art index-based methods. We conducted extensive experimentation with real and synthetic datasets of different sizes to substantiate and quantify the performance advantages of our proposal.
Item Type: | Conference Item (Paper) | ||||
---|---|---|---|---|---|
Subjects: | Q Science > QA Mathematics > QA76 Electronic computers. Computer science. Computer software | ||||
Divisions: | Faculty of Science, Engineering and Medicine > Science > Computer Science | ||||
Library of Congress Subject Headings (LCSH): | Big data, Electronic data processing -- Distributed processing, Parallel processing (Electronic computers) | ||||
Journal or Publication Title: | 2018 IEEE International Conference on Big Data (Big Data) | ||||
Publisher: | IEEE | ||||
Official Date: | 25 October 2018 | ||||
Dates: |
|
||||
Status: | Peer Reviewed | ||||
Publication Status: | Published | ||||
Reuse Statement (publisher, data, author rights): | © 2018 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. | ||||
Access rights to Published version: | Restricted or Subscription Access | ||||
Date of first compliant deposit: | 3 January 2019 | ||||
Date of first compliant Open Access: | 8 January 2019 | ||||
Conference Paper Type: | Paper | ||||
Title of Event: | 2018 IEEE International Conference on Big Data (Big Data) | ||||
Type of Event: | Conference | ||||
Location of Event: | Seattle, USA | ||||
Date(s) of Event: | 10-14 Dec 2018 | ||||
Related URLs: |
Request changes or add full text files to a record
Repository staff actions (login required)
View Item |
Downloads
Downloads per month over past year