Investigating the effects of Single Event Upset in domain-specific accelerators represents one of the key enablers to deploy Deep Neural Networks (DNNs) in mission-critical edge applications. Currently, reliability analyses related to DNNs mainly focus either on the DNNs model, at application level, or on the hardware accelerator, at architecture level. This paper presents a systematic cross-layer reliability analysis of NVIDIA Deep-Learning Accelerator, a popular family of industry-grade, open and free DNN accelerators. The goals are i) to analyze the propagation of faults from the hardware to the application level, and ii) to compare different architectural configurations. Our investigation delivers new insights into the performance-accuracy-reliability trade-off spanned by the configuration space of Deep Learning accelerators. In particular, the Failure in Time can be reduced up to 4.3x for the same DNN model accuracy and by up to 9.4x for the same performance, while accounting 6.5x inference latency and 1.1% accuracy drop, respectively.

Cross-Layer Reliability Analysis of NVDLA Accelerators: Exploring the Configuration Space

Nazzari A.;Passarello D.;Cassano L.;Miele A.;Bolchini C.
2024-01-01

Abstract

Investigating the effects of Single Event Upset in domain-specific accelerators represents one of the key enablers to deploy Deep Neural Networks (DNNs) in mission-critical edge applications. Currently, reliability analyses related to DNNs mainly focus either on the DNNs model, at application level, or on the hardware accelerator, at architecture level. This paper presents a systematic cross-layer reliability analysis of NVIDIA Deep-Learning Accelerator, a popular family of industry-grade, open and free DNN accelerators. The goals are i) to analyze the propagation of faults from the hardware to the application level, and ii) to compare different architectural configurations. Our investigation delivers new insights into the performance-accuracy-reliability trade-off spanned by the configuration space of Deep Learning accelerators. In particular, the Failure in Time can be reduced up to 4.3x for the same DNN model accuracy and by up to 9.4x for the same performance, while accounting 6.5x inference latency and 1.1% accuracy drop, respectively.
2024
Proceedings of the European Test Workshop
Accelerators
Cross-Layer Reliability Analysis
Deep Neural Networks
Error Simulation
Fault Injection
File in questo prodotto:
File Dimensione Formato  
ETS2024.pdf

Accesso riservato

: Pre-Print (o Pre-Refereeing)
Dimensione 310.17 kB
Formato Adobe PDF
310.17 kB Adobe PDF   Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11311/1275977
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? ND
social impact