Skip to content Skip to navigation
University of Warwick
  • Study
  • |
  • Research
  • |
  • Business
  • |
  • Alumni
  • |
  • News
  • |
  • About

University of Warwick
Publications service & WRAP

Highlight your research

  • WRAP
    • Home
    • Search WRAP
    • Browse by Warwick Author
    • Browse WRAP by Year
    • Browse WRAP by Subject
    • Browse WRAP by Department
    • Browse WRAP by Funder
    • Browse Theses by Department
  • Publications Service
    • Home
    • Search Publications Service
    • Browse by Warwick Author
    • Browse Publications service by Year
    • Browse Publications service by Subject
    • Browse Publications service by Department
    • Browse Publications service by Funder
  • Statistics
  • Help & Advice
University of Warwick

The Library

  • Login

On the acceleration of wavefront applications using distributed many-core architectures

Tools
- Tools
+ Tools

Pennycook, Simon J., Hammond, Simon D., Mudalige, Gihan R., Wright, Steven A. and Jarvis, Stephen A.. (2012) On the acceleration of wavefront applications using distributed many-core architectures. Computer Journal, Vol.55 (No.2). pp. 138-153. ISSN 0010-4620

Full text not available from this repository.
Official URL: http://dx.doi.org/10.1093/comjnl/bxr073

Abstract

In this paper we investigate the use of distributed graphics processing unit (GPU)-based architectures to accelerate pipelined wavefront applications—a ubiquitous class of parallel algorithms used for the solution of a number of scientific and engineering applications. Specifically, we employ a recently developed port of the LU solver (from the NAS Parallel Benchmark suite) to investigate the performance of these algorithms on high-performance computing solutions from NVIDIA (Tesla C1060 and C2050) as well as on traditional clusters (AMD/InfiniBand and IBM BlueGene/P). Benchmark results are presented for problem classes A to C and a recently developed performance model is used to provide projections for problem classes D and E, the latter of which represents a billion-cell problem. Our results demonstrate that while the theoretical performance of GPU solutions will far exceed those of many traditional technologies, the sustained application performance is currently comparable for scientific wavefront applications. Finally, a breakdown of the GPU solution is conducted, exposing PCIe overheads and decomposition constraints. A new k-blocking strategy is proposed to improve the future performance of this class of algorithm on GPU-based architectures.

Item Type: Journal Article
Subjects: Q Science > QA Mathematics > QA76 Electronic computers. Computer science. Computer software
T Technology > T Technology (General)
Divisions: Faculty of Science > Computer Science
Library of Congress Subject Headings (LCSH): Graphics processing units, Parallel algorithms
Journal or Publication Title: Computer Journal
Publisher: Oxford University Press
ISSN: 0010-4620
Date: February 2012
Volume: Vol.55
Number: No.2
Page Range: pp. 138-153
Identification Number: 10.1093/comjnl/bxr073
Status: Peer Reviewed
Publication Status: Published
Access rights to Published version: Restricted or Subscription Access
Funder: Royal Society (Great Britain), Atomic Weapons Establishment (Great Britain) (AWE), Knowledge Transfer Partnerships
Grant number: IF090020/AM (RS), KTP006740 (KTP)
URI: http://wrap.warwick.ac.uk/id/eprint/37025

Request changes to a record

Actions (login required)

View Item View Item
twitter

Email us: publications@warwick.ac.uk
Contact Details
About Us