,

Fault-Tolerance Techniques for High-Performance Computing

Specificaties
Paperback, blz. | Engels
Springer International Publishing | e druk, 2016
ISBN13: 9783319355603
Rubricering
Juridisch :
Springer International Publishing e druk, 2016 9783319355603
Verwachte levertijd ongeveer 9 werkdagen

Samenvatting

This timely text presents a comprehensive overview of fault tolerance techniques for high-performance computing (HPC). The text opens with a detailed introduction to the concepts of checkpoint protocols and scheduling algorithms, prediction, replication, silent error detection and correction, together with some application-specific techniques such as ABFT. Emphasis is placed on analytical performance models. This is then followed by a review of general-purpose techniques, including several checkpoint and rollback recovery protocols. Relevant execution scenarios are also evaluated and compared through quantitative models. Features: provides a survey of resilience methods and performance models; examines the various sources for errors and faults in large-scale systems; reviews the spectrum of techniques that can be applied to design a fault-tolerant MPI; investigates different approaches to replication; discusses the challenge of energy consumption of fault-tolerance methods in extreme-scale systems.

Specificaties

ISBN13:9783319355603
Taal:Engels
Bindwijze:paperback
Uitgever:Springer International Publishing

Inhoudsopgave

<p>Part I: General Overview</p><p>Fault-Tolerance Techniques for High-Performance Computing<br>Jack Dongarra, Thomas Herault and Yves Robert</p><p>Part II: Technical Contributions</p><p>Errors and Faults<br>Ana Gainaru and Franck Cappello</p><p>Fault-Tolerant MPI<br>Aurelien Bouteiller</p><p>Using Replication for Resilience on Exascale Systems<br>Henri Casanova, Frédéric Vivien and Dounia Zaidouni</p><p>Energy-Aware Checkpointing Strategies<br>Guillaume Aupy, Anne Benoit, Mohammed El Mehdi Diouri, Olivier Glück and Laurent Lefèvre</p>

Net verschenen

Rubrieken

Populaire producten

    Personen

      Trefwoorden

        Fault-Tolerance Techniques for High-Performance Computing