The Library
MapRDD : finer grained resilient distributed dataset for machine learning
Tools
Li, Zhenyu and Jarvis, Stephen A. (2018) MapRDD : finer grained resilient distributed dataset for machine learning. In: BeyondMR’18 : Algorithms and Systems for MapReduce and Beyond, Houston, TX, USA, 15 Jun 2018. Published in: Proceedings of the 5th ACM SIGMOD Workshop on Algorithms and Systems for MapReduce and Beyond ISBN 9781450357036. doi:10.1145/3206333.3206335
|
PDF
WRAP-MapRDD-finer-grained-resilient-distributed-dataset-machine-learning-Li-2018.pdf - Accepted Version - Requires a PDF viewer. Download (995Kb) | Preview |
Official URL: http://dx.doi.org/10.1145/3206333.3206335
Abstract
The Resilient Distributed Dataset (RDD) is the core memory abstraction behind the popular data-analytic framework Apache Spark. We present an extension to the Resilient Distributed Dataset for map transformations, that we call MapRDD, which takes advantage of the underlying relations between records in the parent and child datasets, in order to achieve random-access of individual records in a partition. The design is complemented by a new MemoryStore, which manages data sampling and data transfers asynchronously. We use the ImageNet dataset to demonstrate that: (I) The initial data loading phase is redundant and can be completely avoided; (II) Sampling on the CPU can be entirely overlapped with training on the GPU to achieve near full occupancy; (III) CPU processing cycles and memory usage can be reduced by more than 90%, allowing other applications to be run simultaneously; (IV) Constant training step time can be achieved, regardless of the size of the partition, for up to 1.3 million records in our experiments. We expect to obtain the same improvements in other RDD transformations via further research on finer-grained implicit & explicit dataset relations.
Item Type: | Conference Item (Paper) | ||||||
---|---|---|---|---|---|---|---|
Subjects: | Q Science > QA Mathematics > QA76 Electronic computers. Computer science. Computer software | ||||||
Divisions: | Faculty of Science, Engineering and Medicine > Science > Computer Science | ||||||
Library of Congress Subject Headings (LCSH): | Machine learning, Graphics processing units | ||||||
Journal or Publication Title: | Proceedings of the 5th ACM SIGMOD Workshop on Algorithms and Systems for MapReduce and Beyond | ||||||
Publisher: | ACM | ||||||
ISBN: | 9781450357036 | ||||||
Book Title: | Proceedings of the 5th ACM SIGMOD Workshop on Algorithms and Systems for MapReduce and Beyond - BeyondMR'18 | ||||||
Official Date: | 2018 | ||||||
Dates: |
|
||||||
Article Number: | 3 | ||||||
DOI: | 10.1145/3206333.3206335 | ||||||
Status: | Peer Reviewed | ||||||
Publication Status: | Published | ||||||
Access rights to Published version: | Restricted or Subscription Access | ||||||
Date of first compliant deposit: | 21 June 2018 | ||||||
Date of first compliant Open Access: | 21 June 2018 | ||||||
RIOXX Funder/Project Grant: |
|
||||||
Conference Paper Type: | Paper | ||||||
Title of Event: | BeyondMR’18 : Algorithms and Systems for MapReduce and Beyond | ||||||
Type of Event: | Conference | ||||||
Location of Event: | Houston, TX, USA | ||||||
Date(s) of Event: | 15 Jun 2018 | ||||||
Related URLs: |
Request changes or add full text files to a record
Repository staff actions (login required)
View Item |
Downloads
Downloads per month over past year