The Library
Communication-avoiding optimizations for large-scale unstructured-mesh applications with OP2
Tools
Ekanayake, Peduru Hewage Suneth Dasantha (2023) Communication-avoiding optimizations for large-scale unstructured-mesh applications with OP2. PhD thesis, University of Warwick.
|
PDF
WRAP_Theses_Ekanayake_2023.pdf - Unspecified Version - Requires a PDF viewer. Download (10Mb) | Preview |
Official URL: http://webcat.warwick.ac.uk/record=b3985145~S1
Abstract
This thesis presents data movement-reducing and communication-avoiding optimizations and their practicable implementation for large-scale unstructured-mesh numerical simulation applications. Utilizing the high-level abstractions of the OP2 domain-specific library, we reason about techniques for reduced communications across a consecutive sequence of loops – a loop-chain. The optimizations are explored for shared-memory systems where multiple processors share a common memory space and distributed-memory systems that comprise separate memory spaces across multiple nodes. We elucidate the challenges when executing unstructured-mesh applications on large-scale high-performance systems that are specifically related to data sharing and movement, synchronization, and communication among processes. A key feature of the work is to mitigate these problems for real-world, large-scale applications and computing kernels, bringing together proven and effective techniques within a DSL framework.
On shared-memory systems, We explore cache-blocking tiling, a key technique for exploiting data locality, in unstructured-mesh applications by integrating the SLOPE library, a cache-blocking tiling library, with OP2. For distributed-memory systems, we analyze the trade-off between increased redundant computation in place of data movement and design a new communication-avoiding back-end for OP2 that applies these techniques automatically to any OP2 application targeting CPUs and GPUs.
The communication-avoiding optimizations are applied to two non-trivial applications, including the OP2 version of Rolls Royce’s production CFD application, Hydra, on problem sizes representative of real-world workloads. Results demonstrate how, for select configurations, the new communication-avoiding back-end provides between 30 – 65% runtime reductions for the loop-chains in these applications on both an HPE Cray EX system and an NVIDIA V100 GPU cluster. We model and examine the determinants and characteristics of a given unstructured-mesh loop-chain that lead to performance benefits with communication-avoidance techniques, providing insights into the general feasibility and profitability of using the optimizations for this class of applications.
Item Type: | Thesis (PhD) | ||||
---|---|---|---|---|---|
Subjects: | Q Science > QA Mathematics > QA76 Electronic computers. Computer science. Computer software | ||||
Library of Congress Subject Headings (LCSH): | Parallel processing (Electronic computers), Parallel programming (Computer science), High performance computing, Computer algorithms | ||||
Official Date: | September 2023 | ||||
Dates: |
|
||||
Institution: | University of Warwick | ||||
Theses Department: | Department of Computer Science | ||||
Thesis Type: | PhD | ||||
Publication Status: | Unpublished | ||||
Supervisor(s)/Advisor: | Mudalige, Gihan R., Jarvis, Stephen A.,1970- | ||||
Sponsors: | Rolls-Royce plc ; Engineering and Physical Sciences Research Council ; Strategic Partnership in Computational Science for Advanced Simulation and Modelling of Engineering Systems (ASiMoV) | ||||
Extent: | xxiv, 210 pages : illustrations | ||||
Language: | eng |
Request changes or add full text files to a record
Repository staff actions (login required)
View Item |