The Library
TurboMGNN : improving concurrent GNN training tasks on GPU with fine-grained kernel fusion
Tools
Wu, Wenchao, Shi, Xuanhua, He, Ligang and Jin, Hai (2023) TurboMGNN : improving concurrent GNN training tasks on GPU with fine-grained kernel fusion. IEEE Transactions on Parallel and Distributed Systems, 36 (6). pp. 1968-1981. doi:10.1109/tpds.2023.3267943 ISSN 1045-9219.
|
PDF
WRAP-TurboMGNN-improving-concurrent-GNN-training-tasks-GPU-fine-grained-kernel-fusion-He-2023.pdf - Published Version - Requires a PDF viewer. Available under License Creative Commons Attribution Non-commercial No Derivatives 4.0. Download (2644Kb) | Preview |
Official URL: https://doi.org/10.1109/tpds.2023.3267943
Abstract
Graph Neural Networks (GNN) have evolved as powerful models for graph representation learning. Many works have been proposed to support GNN training efficiently on GPU. However, these works only focus on a single GNN training task such as operator optimization, task scheduling, and programming model. Concurrent GNN training, which is needed in the applications such as neural network structure search, has not been explored yet. This work aims to improve the training efficiency of the concurrent GNN training tasks on GPU by developing fine-grained methods to fuse the kernels from different tasks. Specifically, we propose a fine-grained Sparse Matrix Multiplication (SpMM) based kernel fusion method to eliminate redundant accesses to graph data. In order to increase the fusion opportunity and reduce the synchronization cost, we further propose a novel technique to enable the fusion of the kernels in forward and backward propagation. Finally, in order to reduce the resource contention caused by the increased number of concurrent, heterogeneous GNN training tasks, we propose an adaptive strategy to group the tasks and match their operators according to resource contention. We have conducted extensive experiments, including kernel- and model-level benchmarks. The results show that the proposed methods can achieve up to 2.6X performance speedup.
Item Type: | Journal Article | ||||||||
---|---|---|---|---|---|---|---|---|---|
Subjects: | Q Science > QA Mathematics Q Science > QA Mathematics > QA76 Electronic computers. Computer science. Computer software |
||||||||
Divisions: | Faculty of Science, Engineering and Medicine > Science > Computer Science | ||||||||
SWORD Depositor: | Library Publications Router | ||||||||
Library of Congress Subject Headings (LCSH): | Neural networks (Computer science), Graph theory, Deep learning (Machine learning), Kernel functions | ||||||||
Journal or Publication Title: | IEEE Transactions on Parallel and Distributed Systems | ||||||||
Publisher: | IEEE | ||||||||
ISSN: | 1045-9219 | ||||||||
Official Date: | June 2023 | ||||||||
Dates: |
|
||||||||
Volume: | 36 | ||||||||
Number: | 6 | ||||||||
Page Range: | pp. 1968-1981 | ||||||||
DOI: | 10.1109/tpds.2023.3267943 | ||||||||
Status: | Peer Reviewed | ||||||||
Publication Status: | Published | ||||||||
Access rights to Published version: | Open Access (Creative Commons) | ||||||||
Date of first compliant deposit: | 1 June 2023 | ||||||||
Date of first compliant Open Access: | 2 June 2023 |
Request changes or add full text files to a record
Repository staff actions (login required)
View Item |
Downloads
Downloads per month over past year