Skip to content Skip to navigation
University of Warwick
  • Study
  • |
  • Research
  • |
  • Business
  • |
  • Alumni
  • |
  • News
  • |
  • About

University of Warwick
Publications service & WRAP

Highlight your research

  • WRAP
    • Home
    • Search WRAP
    • Browse by Warwick Author
    • Browse WRAP by Year
    • Browse WRAP by Subject
    • Browse WRAP by Department
    • Browse WRAP by Funder
    • Browse Theses by Department
  • Publications Service
    • Home
    • Search Publications Service
    • Browse by Warwick Author
    • Browse Publications service by Year
    • Browse Publications service by Subject
    • Browse Publications service by Department
    • Browse Publications service by Funder
  • Statistics
  • Help & Advice
University of Warwick

The Library

  • Login

Performance analysis of a hybrid MPI/CUDA implementation of the NAS-LU benchmark

Tools
- Tools
+ Tools

Pennycook, Simon J., Hammond, Simon D., Mudalige, Gihan R. and Jarvis, Stephen A., 1970- (2010) Performance analysis of a hybrid MPI/CUDA implementation of the NAS-LU benchmark. In: 1st International Workshop on Performance Modeling, Benchmarking and Simulation of High Performance Computing Systems (PMBS 10), New Orleans, LA, USA, 13-19, Nov 2010

Full text not available from this repository.

Abstract

The emergence of Graphics Processing Units (GPUs) as a potential alternative to conventional general-purpose processors has led to significant interest in these architectures by both the academic community and the High Performance Computing (HPC) industry. While GPUs look likely to deliver unparalleled levels of performance, the publication of studies claiming performance improvements in excess of 30,000x are misleading. Significant on-node performance improvements have been demonstrated for code kernels and algorithms amenable to GPU acceleration; studies demonstrating comparable results for full scientific applications requiring multiple-GPU architectures are rare. In this paper we present an analysis of a port of the NAS LU benchmark to NVIDIA's Compute Unified Device Architecture (CUDA) - the most stable GPU programming model currently available. Our solution is also extended to multiple nodes and multiple GPU devices. Runtime performance on several GPUs is presented, ranging from low-end, consumer-grade cards such as the 8400GS to NVIDIA's flagship Fermi HPC processor found in the recently released C2050. We compare the runtimes of these devices to several processors including those from Intel, AMD and IBM. In addition to this we utilise a recently developed performance model of LU. With this we predict the runtime performance of LU on large-scale distributed GPU clusters, which are predicted to become commonplace in future high-end HPC architectural solutions.

Item Type: Conference Item (Paper)
Subjects: Q Science > QA Mathematics > QA76 Electronic computers. Computer science. Computer software
?? QA76.73 ??
Divisions: Faculty of Science > Computer Science
Date: November 2010
Status: Not Peer Reviewed
Publication Status: Published
Description: The 1st International Workshop on Performance Modeling, Benchmarking and Simulation of High-Performance Computing Systems (PMBS 10) was held as part of the ACM/IEEE International Conference for High Performance, Networking, Storage and Analysis (SC 10), in New Orleans, Louisiana, USA
Conference Paper Type: Paper
Title of Event: 1st International Workshop on Performance Modeling, Benchmarking and Simulation of High Performance Computing Systems (PMBS 10)
Type of Event: Workshop
Location of Event: New Orleans, LA, USA
Date(s) of Event: 13-19, Nov 2010
Related URLs:
  • Other Repository
URI: http://wrap.warwick.ac.uk/id/eprint/47467

Request changes to a record

Actions (login required)

View Item View Item
twitter

Email us: publications@warwick.ac.uk
Contact Details
About Us