The Library

Vectorizing unstructured mesh computations for many-core architectures.

Tools

Reguly, I Z., László, Endre, Mudalige, Gihan R. and Giles, Mike B. (2016) Vectorizing unstructured mesh computations for many-core architectures. Concurrency and Computation: Practice and Experience, 28 (2). pp. 557-577. doi:10.1002/cpe.3621 ISSN 1532-0626.

Preview

PDF
WRAP-Vectorizing-mesh-many-core-Mudalige-2015.pdf - Accepted Version - Requires a PDF viewer.
Download (1181Kb) | Preview

Official URL: http://dx.doi.org/10.1002/cpe.3621

Request Changes to record.

Abstract

Achieving optimal performance on the latest multi-core and many-core architectures increasingly depends on making efficient use of the hardware's vector units. This paper presents results on achieving high performance through vectorization on CPUs and the Xeon-Phi on a key class of irregular applications: unstructured mesh computations. Using single instruction multiple thread (SIMT) and single instruction multiple data (SIMD) programming models, we show how unstructured mesh computations map to OpenCL or vector intrinsics through the use of code generation techniques in the OP2 Domain Specific Library and explore how irregular memory accesses and race conditions can be organized on different hardware. We benchmark Intel Xeon CPUs and the Xeon-Phi, using a tsunami simulation and a representative CFD benchmark. Results are compared with previous work on CPUs and NVIDIA GPUs to provide a comparison of achievable performance on current many-core systems. We show that auto-vectorization and the OpenCL SIMT model do not map efficiently to CPU vector units because of vectorization issues and threading overheads. In contrast, using SIMD vector intrinsics imposes some restrictions and requires more involved programming techniques but results in efficient code and near-optimal performance, two times faster than non-vectorized code. We observe that the Xeon-Phi does not provide good performance for these applications but is still comparable with a pair of mid-range Xeon chips.

Item Type:

Journal Article

Subjects:

Q Science > QA Mathematics

Divisions:

Faculty of Science, Engineering and Medicine > Science > Computer Science

Library of Congress Subject Headings (LCSH):

Computer programming, Microprocessors -- Programming, Parallel programming (Computer science)

Journal or Publication Title:

Concurrency and Computation: Practice and Experience

Publisher:

John Wiley & Sons Ltd.

ISSN:

1532-0626

Official Date:

February 2016

Dates:

Date	Event
February 2016	Published
28 August 2015	Available
17 July 2015	Accepted
15 April 2014	Submitted

Volume:

Number:

Page Range:

pp. 557-577

DOI:

10.1002/cpe.3621

Status:

Peer Reviewed

Publication Status:

Published

Access rights to Published version:

Restricted or Subscription Access

Date of first compliant deposit:

3 May 2017

Date of first compliant Open Access:

3 May 2017

Funder:

Engineering and Physical Sciences Research Council (EPSRC), Great Britain. Technology Strategy Board, Rolls-Royce Group plc

Grant number:

EP/I006079/1 ; EP/I00677X/1 ; TAMOP-4.2.1./B-11/2/KMR-2011-002 ; TAMOP-4.2.2./B-10/1-2010-0014

Request changes or add full text files to a record

Repository staff actions (login required)

View Item

Downloads

Downloads per month over past year

View more statistics

University of Warwick
Publications service & WRAP

Highlight your research

The Library

Vectorizing unstructured mesh computations for many-core architectures.

Abstract

Repository staff actions (login required)

Downloads

University of WarwickPublications service & WRAP

Highlight your research

The Library

Vectorizing unstructured mesh computations for many-core architectures.

Abstract

Repository staff actions (login required)

Downloads

University of Warwick
Publications service & WRAP