The Library
Loop tiling in large-scale stencil codes at run-time with OPS
Tools
Reguly, Istvan Zoltan, Mudalige, Gihan R. and Giles, Mike (2018) Loop tiling in large-scale stencil codes at run-time with OPS. IEEE Transactions on Parallel and Distributed Systems, 29 (4). pp. 873-886. doi:10.1109/TPDS.2017.2778161 ISSN 1045-9219.
|
PDF
WRAP-loop-tiling-large-scale-stencil-codes-run-time-OPS-Mudalige-2017.pdf - Accepted Version - Requires a PDF viewer. Download (972Kb) | Preview |
|
|
PDF
WRAP-supplementary-material.pdf - Supplemental Material - Requires a PDF viewer. Download (368Kb) | Preview |
Official URL: http://dx.doi.org/10.1109/TPDS.2017.2778161
Abstract
The key common bottleneck in most stencil codes is data movement, and prior research has shown that improving data locality through optimisations that optimise across loops do particularly well. However, in many large PDE applications it is not possible to apply such optimisations through compilers because there are many options, execution paths and data per grid point, many dependent on run-time parameters, and the code is distributed across different compilation units. In this paper, we adapt the data locality improving optimisation called tiling for use in large OPS applications both in shared-memory and distributed-memory systems, relying on run-time analysis and delayed execution. We evaluate our approach on a number of applications, observing speedups of 2x on the Cloverleaf 2D/3D proxy applications, which contain 83(2D)/141(3D) loops, 3.5x on the linear solver TeaLeaf, and 1.7x on the compressible Navier-Stokes solver OpenSBLI. We demonstrate strong and weak scalability on up to 4608 cores of CINECA's Marconi supercomputer. We also evaluate our algorithms on Intel's Knights Landing, demonstrating maintained throughput as the problem size grows beyond 16GB, and we do scaling studies up to 8704 cores. The approach is generally applicable to any stencil DSL that provides per loopnest data access information.
Item Type: | Journal Article | |||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Subjects: | Q Science > QA Mathematics > QA76 Electronic computers. Computer science. Computer software | |||||||||||||||||||||
Divisions: | Faculty of Science, Engineering and Medicine > Engineering > Engineering | |||||||||||||||||||||
Library of Congress Subject Headings (LCSH): | Loop tiling (Computer science), Memory management (Computer science), Cache memory, Numerical grid generation (Numerical analysis) | |||||||||||||||||||||
Journal or Publication Title: | IEEE Transactions on Parallel and Distributed Systems | |||||||||||||||||||||
Publisher: | IEEE | |||||||||||||||||||||
ISSN: | 1045-9219 | |||||||||||||||||||||
Official Date: | 1 April 2018 | |||||||||||||||||||||
Dates: |
|
|||||||||||||||||||||
Volume: | 29 | |||||||||||||||||||||
Number: | 4 | |||||||||||||||||||||
Page Range: | pp. 873-886 | |||||||||||||||||||||
DOI: | 10.1109/TPDS.2017.2778161 | |||||||||||||||||||||
Status: | Peer Reviewed | |||||||||||||||||||||
Publication Status: | Published | |||||||||||||||||||||
Access rights to Published version: | Restricted or Subscription Access | |||||||||||||||||||||
Date of first compliant deposit: | 29 November 2017 | |||||||||||||||||||||
Date of first compliant Open Access: | 29 November 2017 | |||||||||||||||||||||
RIOXX Funder/Project Grant: |
|
Request changes or add full text files to a record
Repository staff actions (login required)
View Item |
Downloads
Downloads per month over past year