The Library
Rank join queries in NoSQL databases
Tools
Ntarmos, N., Patlakas, I. and Triantafillou, Peter (2014) Rank join queries in NoSQL databases. Proceedings of the VLDB Endowment, 7 (7). pp. 493-504. doi:10.14778/2732286.2732287 ISSN 2150-8097.
|
PDF
p493-ntarmos.pdf - Published Version - Requires a PDF viewer. Available under License Creative Commons Attribution Non-commercial No Derivatives. Download (499Kb) | Preview |
Official URL: http://doi.org/10.14778/2732286.2732287
Abstract
Rank (i.e., top-k) join queries play a key role in modern analytics tasks. However, despite their importance and unlike centralized settings, they have been completely overlooked in cloud NoSQL settings. We attempt to fill this gap: We contribute a suite of solutions and study their performance comprehensively. Baseline solutions are offered using SQLlike languages (like Hive and Pig), based on MapReduce jobs. We first provide solutions that are based on specialized indices, which may themselves be accessed using either MapReduce or coordinator-based strategies. The first index-based solution is based on inverted indices, which are accessed with MapReduce jobs. The second index-based solution adapts a popular centralized rank-join algorithm. We further contribute a novel statistical structure comprising histograms and Bloomlters, which forms the basis for the third index-based solution. We provide (i) MapReduce algorithms showing how to build these indices and statistical structures, (ii) algorithms to allow for online updates to these indices, and (iii) query processing algorithms utilizing them. We implemented all algorithms in Hadoop (HDFS) and HBase and tested them on TPC-H datasets of various scales, utilizing dierent queries on tables of various sizes and different score-attribute distributions. We ported our implementations to Amazon EC2 and in-house lab clusters of various scales. We provide performance results for three metrics: query execution time, network bandwidth consumption, and dollar-cost for query execution. © 2014 VLDB Endowment.
Item Type: | Journal Article | ||||
---|---|---|---|---|---|
Divisions: | Faculty of Science, Engineering and Medicine > Science > Computer Science | ||||
Journal or Publication Title: | Proceedings of the VLDB Endowment | ||||
Publisher: | ACM | ||||
ISSN: | 2150-8097 | ||||
Official Date: | 1 March 2014 | ||||
Dates: |
|
||||
Volume: | 7 | ||||
Number: | 7 | ||||
Page Range: | pp. 493-504 | ||||
DOI: | 10.14778/2732286.2732287 | ||||
Status: | Peer Reviewed | ||||
Publication Status: | Published | ||||
Access rights to Published version: | Restricted or Subscription Access | ||||
Copyright Holders: | VLDB Endowment | ||||
Date of first compliant deposit: | 7 July 2019 | ||||
Date of first compliant Open Access: | 7 July 2019 | ||||
Embodied As: | 1 | ||||
Conference Paper Type: | Paper | ||||
Title of Event: | 40th International Conference on Very Large Data Bases | ||||
Type of Event: | Conference | ||||
Location of Event: | Hangzhou, China | ||||
Related URLs: | |||||
Open Access Version: |
Request changes or add full text files to a record
Repository staff actions (login required)
View Item |
Downloads
Downloads per month over past year