The Library
Locality optimized unstructured mesh algorithms on GPUs
Tools
Sulyok, A. A, Balogh, G. D., Reguly, I. Z. and Mudalige, Gihan R. (2019) Locality optimized unstructured mesh algorithms on GPUs. Journal of Parallel and Distributed Computing, 134 . pp. 50-64. doi:10.1016/j.jpdc.2019.07.011 ISSN 0743-7315.
|
PDF
WRAP-locality-optimized-unstructured-mesh-algorithms-Mudalige-2019.pdf - Accepted Version - Requires a PDF viewer. Available under License Creative Commons Attribution Non-commercial No Derivatives 4.0. Download (1258Kb) | Preview |
Official URL: https://doi.org/10.1016/j.jpdc.2019.07.011
Abstract
Unstructured-mesh based numerical algorithms such as finite volume and finite element algorithms form an important class of applications for many scientific and engineering domains. The key difficulty in achieving higher performance from these applications is the indirect accesses that lead to data-races when parallelized. Current methods for handling such data-races lead to reduced parallelism and sub-optimal performance. Particularly on modern many-core architectures, such as GPUs, that has increasing core/thread counts, reducing data movement and exploiting memory locality is vital for gaining good performance.In this work we present novel locality-exploiting optimizations for the efficient execution of unstructured-mesh algorithms on GPUs. Building on a two-layered coloring strategy for handling data races, we introduce novel re-ordering and partitioning techniques to further improve efficient execution. The new optimizations are then applied to several well established unstructured-mesh applications, investigating their performance on NVIDIA’s latest P100 and V100 GPUs. We demonstrate significant speedups (1.1–1.75×) compared to the state-of-the-art. A range of performance metrics are benchmarked including runtime, memory transactions, achieved bandwidth performance, GPU occupancy and data reuse factors and are used to understand and explain the key factors impacting performance. The optimized algorithms are implemented as an open-source software library and we illustrate its use for improving performance of existing or new unstructured-mesh applications
Item Type: | Journal Article | ||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Subjects: | Q Science > QA Mathematics | ||||||||||||||||||
Divisions: | Faculty of Science, Engineering and Medicine > Science > Computer Science | ||||||||||||||||||
Library of Congress Subject Headings (LCSH): | Finite volume method, Finite element method -- Computer programs, Graphics processing units -- Programming, Algorithms, Numerical grid generation (Numerical analysis) | ||||||||||||||||||
Journal or Publication Title: | Journal of Parallel and Distributed Computing | ||||||||||||||||||
Publisher: | Elsevier Science BV | ||||||||||||||||||
ISSN: | 0743-7315 | ||||||||||||||||||
Official Date: | December 2019 | ||||||||||||||||||
Dates: |
|
||||||||||||||||||
Volume: | 134 | ||||||||||||||||||
Page Range: | pp. 50-64 | ||||||||||||||||||
DOI: | 10.1016/j.jpdc.2019.07.011 | ||||||||||||||||||
Status: | Peer Reviewed | ||||||||||||||||||
Publication Status: | Published | ||||||||||||||||||
Access rights to Published version: | Restricted or Subscription Access | ||||||||||||||||||
Date of first compliant deposit: | 1 August 2019 | ||||||||||||||||||
Date of first compliant Open Access: | 9 August 2020 | ||||||||||||||||||
RIOXX Funder/Project Grant: |
|
||||||||||||||||||
Related URLs: |
Request changes or add full text files to a record
Repository staff actions (login required)
View Item |
Downloads
Downloads per month over past year