
The Library
Scaling up stochastic gradient descent for non-convex optimisation
Tools
Mohamad, Saad, Alamri, Hamad and Bouchachia, Abdelhamid (2022) Scaling up stochastic gradient descent for non-convex optimisation. Machine Learning, 111 . pp. 4039-4079. doi:10.1007/s10994-022-06243-3 ISSN 2632-2153.
|
PDF
WRAP-Scaling-up-stochastic-gradient-descent-for-non-convex-optimisation-Alamri-2022.pdf - Published Version - Requires a PDF viewer. Available under License Creative Commons Attribution 4.0. Download (3542Kb) | Preview |
Official URL: http://dx.doi.org/10.1007/s10994-022-06243-3
Abstract
Stochastic gradient descent (SGD) is a widely adopted iterative method for optimizing differentiable objective functions. In this paper, we propose and discuss a novel approach to scale up SGD in applications involving non-convex functions and large datasets. We address the bottleneck problem arising when using both shared and distributed memory. Typically, the former is bounded by limited computation resources and bandwidth whereas the latter suffers from communication overheads. We propose a unified distributed and parallel implementation of SGD (named DPSGD) that relies on both asynchronous distribution and lock-free parallelism. By combining two strategies into a unified framework, DPSGD is able to strike a better trade-off between local computation and communication. The convergence properties of DPSGD are studied for non-convex problems such as those arising in statistical modelling and machine learning. Our theoretical analysis shows that DPSGD leads to speed-up with respect to the number of cores and number of workers while guaranteeing an asymptotic convergence rate of O(1/T−−√) given that the number of cores is bounded by T1/4 and the number of workers is bounded by T1/2 where T is the number of iterations. The potential gains that can be achieved by DPSGD are demonstrated empirically on a stochastic variational inference problem (Latent Dirichlet Allocation) and on a deep reinforcement learning (DRL) problem (advantage actor critic - A2C) resulting in two algorithms: DPSVI and HSA2C. Empirical results validate our theoretical findings. Comparative studies are conducted to show the performance of the proposed DPSGD against the state-of-the-art DRL algorithms.
Item Type: | Journal Article | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Subjects: | Q Science > Q Science (General) Q Science > QA Mathematics > QA76 Electronic computers. Computer science. Computer software |
||||||||||
Divisions: | Faculty of Science, Engineering and Medicine > Engineering > WMG (Formerly the Warwick Manufacturing Group) | ||||||||||
Library of Congress Subject Headings (LCSH): | Machine learning, Mathematical optimization, Nonconvex programming, Stochastic processes -- Mathematical models, TensorFlow | ||||||||||
Journal or Publication Title: | Machine Learning | ||||||||||
Publisher: | Springer | ||||||||||
ISSN: | 2632-2153 | ||||||||||
Official Date: | November 2022 | ||||||||||
Dates: |
|
||||||||||
Volume: | 111 | ||||||||||
Number of Pages: | 41 | ||||||||||
Page Range: | pp. 4039-4079 | ||||||||||
DOI: | 10.1007/s10994-022-06243-3 | ||||||||||
Status: | Peer Reviewed | ||||||||||
Publication Status: | Published | ||||||||||
Access rights to Published version: | Open Access (Creative Commons) | ||||||||||
Date of first compliant deposit: | 1 November 2022 | ||||||||||
Date of first compliant Open Access: | 1 November 2022 | ||||||||||
RIOXX Funder/Project Grant: |
|
Request changes or add full text files to a record
Repository staff actions (login required)
![]() |
View Item |
Downloads
Downloads per month over past year