This paper presents a novel approach to the design of multi-/many-core systems with an adaptive level of reliability. The approach defines a layer at the operating system level that achieves fault detection/tolerance/diagnosis properties by means of thread replication and re-execution mechanisms. The layer applies the most convenient hardening mechanism to achieve the desired trade-off between reliability and performance by adapting at run-time to the changes of the working scenario. The proposed strategy has been applied in a set of experimental sessions considering a real-world parallel application, to evaluate its benefits on the final system with respect to various strategies selected at design time.

Self-Adaptive Fault Tolerance in Multi-/Many-Core Systems

BOLCHINI, CRISTIANA;CARMINATI, MATTEO;MIELE, ANTONIO ROSARIO
2013-01-01

Abstract

This paper presents a novel approach to the design of multi-/many-core systems with an adaptive level of reliability. The approach defines a layer at the operating system level that achieves fault detection/tolerance/diagnosis properties by means of thread replication and re-execution mechanisms. The layer applies the most convenient hardening mechanism to achieve the desired trade-off between reliability and performance by adapting at run-time to the changes of the working scenario. The proposed strategy has been applied in a set of experimental sessions considering a real-world parallel application, to evaluate its benefits on the final system with respect to various strategies selected at design time.
2013
File in questo prodotto:
File Dimensione Formato  
art%3A10.1007%2Fs10836-013-5367-y.pdf

Accesso riservato

: Publisher’s version
Dimensione 1.36 MB
Formato Adobe PDF
1.36 MB Adobe PDF   Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11311/718746
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 23
  • ???jsp.display-item.citation.isi??? 18
social impact