Investigating the effects of Single Event Upset in domain-specific accelerators represents one of the key enablers to deploy Deep Neural Networks (DNNs) in mission-critical edge applications. Currently, reliability analyses related to DNNs mainly focus either on the DNNs model, at application level, or on the hardware accelerator, at architecture level. This paper presents a systematic cross-layer reliability analysis of NVIDIA Deep-Learning Accelerator, a popular family of industry-grade, open and free DNN accelerators. The goals are i) to analyze the propagation of faults from the hardware to the application level, and ii) to compare different architectural configurations. Our investigation delivers new insights into the performance-accuracy-reliability trade-off spanned by the configuration space of Deep Learning accelerators. In particular, the Failure in Time can be reduced up to 4.3x for the same DNN model accuracy and by up to 9.4x for the same performance, while accounting 6.5x inference latency and 1.1% accuracy drop, respectively.
Cross-Layer Reliability Analysis of NVDLA Accelerators: Exploring the Configuration Space
Nazzari A.;Passarello D.;Cassano L.;Miele A.;Bolchini C.
2024-01-01
Abstract
Investigating the effects of Single Event Upset in domain-specific accelerators represents one of the key enablers to deploy Deep Neural Networks (DNNs) in mission-critical edge applications. Currently, reliability analyses related to DNNs mainly focus either on the DNNs model, at application level, or on the hardware accelerator, at architecture level. This paper presents a systematic cross-layer reliability analysis of NVIDIA Deep-Learning Accelerator, a popular family of industry-grade, open and free DNN accelerators. The goals are i) to analyze the propagation of faults from the hardware to the application level, and ii) to compare different architectural configurations. Our investigation delivers new insights into the performance-accuracy-reliability trade-off spanned by the configuration space of Deep Learning accelerators. In particular, the Failure in Time can be reduced up to 4.3x for the same DNN model accuracy and by up to 9.4x for the same performance, while accounting 6.5x inference latency and 1.1% accuracy drop, respectively.File | Dimensione | Formato | |
---|---|---|---|
ETS2024.pdf
Accesso riservato
:
Pre-Print (o Pre-Refereeing)
Dimensione
310.17 kB
Formato
Adobe PDF
|
310.17 kB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.