Fault-Tolerance Techniques for High-Performance Computing

Name: Fault-Tolerance Techniques for High-Performance Computing
Author: Thomas Herault

Specificaties

Paperback, blz. | Engels

Springer International Publishing | e druk, 2016

ISBN13: 9783319355603

Rubricering

Juridisch :

Springer International Publishing e druk, 2016 9783319355603

Onderdeel van serie Computer Communications and Networks

Verwachte levertijd ongeveer 9 werkdagen

120,99

In winkelwagen

Samenvatting

This timely text presents a comprehensive overview of fault tolerance techniques for high-performance computing (HPC). The text opens with a detailed introduction to the concepts of checkpoint protocols and scheduling algorithms, prediction, replication, silent error detection and correction, together with some application-specific techniques such as ABFT. Emphasis is placed on analytical performance models. This is then followed by a review of general-purpose techniques, including several checkpoint and rollback recovery protocols. Relevant execution scenarios are also evaluated and compared through quantitative models. Features: provides a survey of resilience methods and performance models; examines the various sources for errors and faults in large-scale systems; reviews the spectrum of techniques that can be applied to design a fault-tolerant MPI; investigates different approaches to replication; discusses the challenge of energy consumption of fault-tolerance methods in extreme-scale systems.

Specificaties

ISBN13:9783319355603

Taal:Engels

Bindwijze:paperback

Uitgever:Springer International Publishing

Serie:Computer Communications and Networks

Inhoudsopgave

Part I: General OverviewFault-Tolerance Techniques for High-Performance Computing Jack Dongarra, Thomas Herault and Yves RobertPart II: Technical ContributionsErrors and Faults Ana Gainaru and Franck CappelloFault-Tolerant MPI Aurelien BouteillerUsing Replication for Resilience on Exascale Systems Henri Casanova, Frédéric Vivien and Dounia ZaidouniEnergy-Aware Checkpointing Strategies Guillaume Aupy, Anne Benoit, Mohammed El Mehdi Diouri, Olivier Glück and Laurent Lefèvre

Uw winkelwagen

Fault-Tolerance Techniques for High-Performance Computing

Samenvatting

Specificaties

Inhoudsopgave

Net verschenen

Rubrieken

Populaire producten

Personen

Trefwoorden