The Library
Case study of error recovery and error propagation on ranger
Tools
Chuah, Edward, Jhumka, Arshad, Alt, Samantha, Damoulas, Theodoros, Gurumdimma, Nentawe, Sawley, Marie-Christine, Barth, William L., Minyard, Tommy and Browne, James C. (2017) Case study of error recovery and error propagation on ranger. In: 24th IEEE International Conference on High Performance Computing, Data, and Analytics (HiPC 2017), Jaipur, India, 18-21 Dec 2017 (Unpublished)
|
PDF
WRAP-case-study-error-recovery-ranger-Chuah-2017.pdf - Submitted Version - Requires a PDF viewer. Download (1938Kb) | Preview |
Abstract
We give the details of two new dependability oriented use cases on recovery attempt and error propagation on the Ranger supercomputer. The use cases are: (i) Error propagation between the Lustre file-system I/O and Infiniband, and (ii) Recovery attempt and its impact on the chipset and memory system.
Item Type: | Conference Item (Paper) | |||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Subjects: | Q Science > QA Mathematics > QA76 Electronic computers. Computer science. Computer software | |||||||||||||||
Divisions: | Faculty of Science, Engineering and Medicine > Science > Computer Science | |||||||||||||||
Library of Congress Subject Headings (LCSH): | Supercomputers, Data recovery (Computer science), InfiniBand (Standard) | |||||||||||||||
Official Date: | 8 September 2017 | |||||||||||||||
Dates: |
|
|||||||||||||||
Status: | Peer Reviewed | |||||||||||||||
Publication Status: | Unpublished | |||||||||||||||
Description: | This paper contains two other case studies and is a companion report to the main technical paper which has been accepted for presentation and publication at the 24th IEEE International Conference on High Performance Computing, Data, and Analytics (HiPC 2017). See related item in WRAP link |
|||||||||||||||
Date of first compliant deposit: | 13 September 2017 | |||||||||||||||
RIOXX Funder/Project Grant: |
|
|||||||||||||||
Conference Paper Type: | Paper | |||||||||||||||
Title of Event: | 24th IEEE International Conference on High Performance Computing, Data, and Analytics (HiPC 2017) | |||||||||||||||
Type of Event: | Conference | |||||||||||||||
Location of Event: | Jaipur, India | |||||||||||||||
Date(s) of Event: | 18-21 Dec 2017 | |||||||||||||||
Related URLs: |
Request changes or add full text files to a record
Repository staff actions (login required)
View Item |
Downloads
Downloads per month over past year