
The Library
An efficient task-based all-reduce for machine learning applications
Tools
Li, Zhenyu, Davis, James A. and Jarvis, Stephen A. (2017) An efficient task-based all-reduce for machine learning applications. In: Machine Learning on HPC Environments, ACM New York, NY, USA, 12-17 Nov 2017. Published in: Proceedings of the Machine Learning on HPC Environments (MLHPC'17) ISBN 9781450351379. doi:10.1145/3146347.3146350
|
PDF
WRAP-efficient-task-based-all-reduce-machine-learning-applications-Li-2017.pdf - Accepted Version - Requires a PDF viewer. Download (1402Kb) | Preview |
Official URL: http://dx.doi.org/10.1145/3146347.3146350
Abstract
All-Reduce is a collective-combine operation frequently utilised in synchronous parameter updates in parallel machine learning algorithms. The performance of this operation - and subsequently of the algorithm itself - is heavily dependent on its implementation, configuration and on the supporting hardware on which it is run. Given the pivotal role of all-reduce, a failure in any of these regards will significantly impact the resulting scientific output.
In this research we explore the performance of alternative all-reduce algorithms in data-flow graphs and compare these to the commonly used reduce-broadcast approach. We present an architecture and interface for all-reduce in task-based frameworks, and a parallelization scheme for object-serialization and computation. We present a concrete, novel application of a butterfly all-reduce algorithm on the Apache Spark framework on a high-performance compute cluster, and demonstrate the effectiveness of the new butterfly algorithm with a logarithmic speed-up with respect to the vector length compared with the original reduce-broadcast method - a 9x speed-up is observed for vector lengths in the order of 108. This improvement is comprised of both algorithmic changes (65%) and parallel-processing optimization (35%).
The effectiveness of the new butterfly all-reduce is demonstrated using real-world neural network applications with the Spark framework. For the model-update operation we observe significant speed-ups using the new butterfly algorithm compared with the original reduce-broadcast, for both smaller (Cifar and Mnist) and larger (ImageNet) datasets.
Item Type: | Conference Item (Paper) | |||||||||
---|---|---|---|---|---|---|---|---|---|---|
Subjects: | Q Science > QA Mathematics > QA76 Electronic computers. Computer science. Computer software | |||||||||
Divisions: | Faculty of Science, Engineering and Medicine > Science > Computer Science | |||||||||
Library of Congress Subject Headings (LCSH): | Machine learning, Computer algorithms, Parallel programming (Computer science), Parallel processing (Electronic computers), Parallel algorithms, Electronic data processing -- Distributed processing | |||||||||
Journal or Publication Title: | Proceedings of the Machine Learning on HPC Environments (MLHPC'17) | |||||||||
Publisher: | ACM | |||||||||
ISBN: | 9781450351379 | |||||||||
Book Title: | Proceedings of the Machine Learning on HPC Environments - MLHPC'17 | |||||||||
Official Date: | 12 November 2017 | |||||||||
Dates: |
|
|||||||||
DOI: | 10.1145/3146347.3146350 | |||||||||
Status: | Peer Reviewed | |||||||||
Publication Status: | Published | |||||||||
Access rights to Published version: | Restricted or Subscription Access | |||||||||
Date of first compliant deposit: | 5 December 2017 | |||||||||
Date of first compliant Open Access: | 6 December 2017 | |||||||||
Funder: | atos | |||||||||
RIOXX Funder/Project Grant: |
|
|||||||||
Conference Paper Type: | Paper | |||||||||
Title of Event: | Machine Learning on HPC Environments | |||||||||
Type of Event: | Conference | |||||||||
Location of Event: | ACM New York, NY, USA | |||||||||
Date(s) of Event: | 12-17 Nov 2017 | |||||||||
Related URLs: |
Request changes or add full text files to a record
Repository staff actions (login required)
![]() |
View Item |
Downloads
Downloads per month over past year