Skip to content Skip to navigation
University of Warwick
  • Study
  • |
  • Research
  • |
  • Business
  • |
  • Alumni
  • |
  • News
  • |
  • About

University of Warwick
Publications service & WRAP

Highlight your research

  • WRAP
    • Home
    • Search WRAP
    • Browse by Warwick Author
    • Browse WRAP by Year
    • Browse WRAP by Subject
    • Browse WRAP by Department
    • Browse WRAP by Funder
    • Browse Theses by Department
  • Publications Service
    • Home
    • Search Publications Service
    • Browse by Warwick Author
    • Browse Publications service by Year
    • Browse Publications service by Subject
    • Browse Publications service by Department
    • Browse Publications service by Funder
  • Help & Advice
University of Warwick

The Library

  • Login
  • Admin

Evaluating the performance of legacy applications on emerging parallel architectures

Tools
- Tools
+ Tools

Pennycook, Simon J. (2012) Evaluating the performance of legacy applications on emerging parallel architectures. PhD thesis, University of Warwick.

[img]
Preview
Text
WRAP_THESIS_Pennycook_2012.pdf - Submitted Version

Download (2386Kb) | Preview
Official URL: http://webcat.warwick.ac.uk/record=b2684550~S1

Request Changes to record.

Abstract

The gap between a supercomputer's theoretical maximum (\peak")
oatingpoint
performance and that actually achieved by applications has grown wider
over time. Today, a typical scientific application achieves only 5{20% of any
given machine's peak processing capability, and this gap leaves room for significant
improvements in execution times.
This problem is most pronounced for modern \accelerator" architectures
{ collections of hundreds of simple, low-clocked cores capable of executing the
same instruction on dozens of pieces of data simultaneously. This is a significant
change from the low number of high-clocked cores found in traditional CPUs,
and effective utilisation of accelerators typically requires extensive code and
algorithmic changes. In many cases, the best way in which to map a parallel
workload to these new architectures is unclear.
The principle focus of the work presented in this thesis is the evaluation
of emerging parallel architectures (specifically, modern CPUs, GPUs and Intel
MIC) for two benchmark codes { the LU benchmark from the NAS Parallel
Benchmark Suite and Sandia's miniMD benchmark { which exhibit complex
parallel behaviours that are representative of many scientific applications. Using
combinations of low-level intrinsic functions, OpenMP, CUDA and MPI, we
demonstrate performance improvements of up to 7x for these workloads.
We also detail a code development methodology that permits application developers
to target multiple architecture types without maintaining completely
separate implementations for each platform. Using OpenCL, we develop performance
portable implementations of the LU and miniMD benchmarks that are
faster than the original codes, and at most 2x slower than versions highly-tuned
for particular hardware.
Finally, we demonstrate the importance of evaluating architectures at scale
(as opposed to on single nodes) through performance modelling techniques,
highlighting the problems associated with strong-scaling on emerging accelerator
architectures.

Item Type: Thesis or Dissertation (PhD)
Subjects: Q Science > QA Mathematics > QA76 Electronic computers. Computer science. Computer software
Library of Congress Subject Headings (LCSH): Computer architecture, Parallel computers
Official Date: December 2012
Dates:
DateEvent
December 2012Submitted
Institution: University of Warwick
Theses Department: Department of Computer Science
Thesis Type: PhD
Publication Status: Unpublished
Supervisor(s)/Advisor: Jarvis, Stephen A., 1970-
Extent: xv, 140 leaves : illustrations.
Language: eng

Request changes or add full text files to a record

Repository staff actions (login required)

View Item View Item

Downloads

Downloads per month over past year

View more statistics

twitter

Email us: wrap@warwick.ac.uk
Contact Details
About Us