The Library
Towards unified secure on- and off-line analytics at scale
Tools
Coetzee, Peter, Leeke, Matthew and Jarvis, Stephen A. (2014) Towards unified secure on- and off-line analytics at scale. Parallel Computing, Volume 40 (Number 10). pp. 738-753. doi:10.1016/j.parco.2014.07.004 ISSN 0167-8191.
|
PDF
WRAP_Coetzee_1-s2.0-S0167819114000842-main.pdf - Published Version - Requires a PDF viewer. Available under License Creative Commons Attribution. Download (2848Kb) | Preview |
|
PDF
WRAP_Coetzee_1257419-cs-200814-crucible.pdf - Accepted Version Embargoed item. Restricted access to Repository staff only - Requires a PDF viewer. Download (744Kb) |
Official URL: http://dx.doi.org/10.1016/j.parco.2014.07.004
Abstract
Data scientists have applied various analytic models and techniques to address the oft-cited problems of large volume, high velocity data rates and diversity in semantics. Such approaches have traditionally employed analytic techniques in a streaming or batch processing paradigm. This paper presents CRUCIBLE, a first-in-class framework for the analysis of large-scale datasets that exploits both streaming and batch paradigms in a unified manner. The CRUCIBLE framework includes a domain specific language for describing analyses as a set of communicating sequential processes, a common runtime model for analytic execution in multiple streamed and batch environments, and an approach to automating the management of cell-level security labelling that is applied uniformly across runtimes. This paper shows the applicability of CRUCIBLE to a variety of state-of-the-art analytic environments, and compares a range of runtime models for their scalability and performance against a series of native implementations. The work demonstrates the significant impact of runtime model selection, including improvements of between 2.3× and 480× between runtime models, with an average performance gap of just 14× between CRUCIBLE and a suite of equivalent native implementations.
Item Type: | Journal Article | ||||||
---|---|---|---|---|---|---|---|
Subjects: | Q Science > QA Mathematics > QA76 Electronic computers. Computer science. Computer software | ||||||
Divisions: | Faculty of Science, Engineering and Medicine > Science > Computer Science | ||||||
Library of Congress Subject Headings (LCSH): | Electronic data processing, Big data | ||||||
Journal or Publication Title: | Parallel Computing | ||||||
Publisher: | Elsevier Science BV | ||||||
ISSN: | 0167-8191 | ||||||
Official Date: | December 2014 | ||||||
Dates: |
|
||||||
Volume: | Volume 40 | ||||||
Number: | Number 10 | ||||||
Page Range: | pp. 738-753 | ||||||
DOI: | 10.1016/j.parco.2014.07.004 | ||||||
Status: | Peer Reviewed | ||||||
Publication Status: | Published | ||||||
Access rights to Published version: | Open Access (Creative Commons) | ||||||
Date of first compliant deposit: | 27 December 2015 | ||||||
Date of first compliant Open Access: | 27 December 2015 | ||||||
Funder: | Engineering and Physical Sciences Research Council (EPSRC) | ||||||
Embodied As: | 1 |
Request changes or add full text files to a record
Repository staff actions (login required)
View Item |
Downloads
Downloads per month over past year