The Library
Quantifying the effects of contention on parallel file systems
Tools
Wright, Steven A. and Jarvis, Stephen A. (2015) Quantifying the effects of contention on parallel file systems. In: 16th IEEE International Workshop on Parallel and Distributed Scientific and Engineering Computing, Hyderabad, India, 25-29 May 2015. Published in: 2015 IEEE 29th International Parallel and Distributed Processing Symposium Workshops & PhD Forum (IPDPSW) pp. 932-940. doi:10.1109/IPDPSW.2015.8
|
PDF
WRAP_PID3574021.pdf - Accepted Version - Requires a PDF viewer. Download (426Kb) | Preview |
Official URL: http://dx.doi.org/10.1109/IPDPSW.2015.8
Abstract
As we move towards the Exascale era of supercomputing, node-level failures are becoming more common-place; frequent checkpointing is currently used to recover from such failures in long-running science applications. While compute performance has steadily improved year-on-year, parallel I/O performance has stalled, meaning checkpointing is fast becoming a bottleneck to performance. Using current file systems in the most efficient way possible will alleviate some of these issues and will help prepare developers and system designers for Exascale; unfortunately, many domain-scientists simply submit their jobs with the default file system configuration.
In this paper, we analyse previous work on finding optimality on Lustre file systems, demonstrating that by exposing parallelism in the parallel file system, performance can be improved by up to 49x. However, we demonstrate that on systems where many applications are competing for a finite number of object storage targets (OSTs), competing tasks may reduce optimal performance considerably. We show that reducing each job's request for OSTs by 40% decreases performance by only 13%, while increasing the availability and quality of service of the file system. Further, we present a series of metrics designed to analyse and explain the effects of contention on parallel file systems. Finally, we re-evaluate our previous work with the Parallel Log-structured File System (PLFS), comparing it to Lustre at various scales. We show that PLFS may perform better than Lustre in particular configurations, but that at large scale PLFS becomes a bottleneck to performance. We extend the metrics proposed in this paper to explain these performance deficiencies that exist in PLFS, demonstrating that the software creates high levels of self-contention at scale.
Item Type: | Conference Item (Paper) | ||||||
---|---|---|---|---|---|---|---|
Subjects: | Q Science > QA Mathematics > QA76 Electronic computers. Computer science. Computer software | ||||||
Divisions: | Faculty of Science, Engineering and Medicine > Science > Computer Science | ||||||
Library of Congress Subject Headings (LCSH): | Information storage and retrieval systems, Parallel file systems (Computer science), High performance computing | ||||||
Journal or Publication Title: | 2015 IEEE 29th International Parallel and Distributed Processing Symposium Workshops & PhD Forum (IPDPSW) | ||||||
Publisher: | IEEE | ||||||
Official Date: | 25 May 2015 | ||||||
Dates: |
|
||||||
Page Range: | pp. 932-940 | ||||||
DOI: | 10.1109/IPDPSW.2015.8 | ||||||
Status: | Peer Reviewed | ||||||
Publication Status: | Published | ||||||
Access rights to Published version: | Restricted or Subscription Access | ||||||
Date of first compliant deposit: | 27 April 2016 | ||||||
Date of first compliant Open Access: | 27 April 2016 | ||||||
Conference Paper Type: | Paper | ||||||
Title of Event: | 16th IEEE International Workshop on Parallel and Distributed Scientific and Engineering Computing | ||||||
Type of Event: | Workshop | ||||||
Location of Event: | Hyderabad, India | ||||||
Date(s) of Event: | 25-29 May 2015 | ||||||
Related URLs: |
Request changes or add full text files to a record
Repository staff actions (login required)
View Item |
Downloads
Downloads per month over past year