This paper presents a dynamic scheduling solution to achieve fault tolerance in many-core architectures. Triple Modular Redundancy is applied on the multi-threaded application to dynamically mitigate the effects of both permanent and transient faults, and to identify and isolate damaged units. The approach targets the best performance, while balancing the use of the healthy resources to limit wear-out and aging effects, which cause permanent damages. Experimental results on synthetic case studies are reported, to validate the ability to tolerate faults while optimizing performance and resource usage.
An adaptive approach for online fault management in many-core architectures
BOLCHINI, CRISTIANA;MIELE, ANTONIO ROSARIO;SCIUTO, DONATELLA
2012-01-01
Abstract
This paper presents a dynamic scheduling solution to achieve fault tolerance in many-core architectures. Triple Modular Redundancy is applied on the multi-threaded application to dynamically mitigate the effects of both permanent and transient faults, and to identify and isolate damaged units. The approach targets the best performance, while balancing the use of the healthy resources to limit wear-out and aging effects, which cause permanent damages. Experimental results on synthetic case studies are reported, to validate the ability to tolerate faults while optimizing performance and resource usage.File in questo prodotto:
File | Dimensione | Formato | |
---|---|---|---|
06176589.pdf
Accesso riservato
:
Post-Print (DRAFT o Author’s Accepted Manuscript-AAM)
Dimensione
276.33 kB
Formato
Adobe PDF
|
276.33 kB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.