RE.PUBLIC@POLIMI pubblicazioni di ricerca del Politecnico di Milano

The proliferation of embedded Neural Processing Units (NPUs) is enabling the adoption of Tiny Machine Learning for numerous cognitive computing applications on the edge, where maximizing energy efficiency is key. To overcome the limitations of traditional Von Neumann architectures, novel designs based on computational memories are arising. STMicroelectronics is developing an experimental low-power NPU that integrates Digital In-Memory Computing (DIMC) SRAM with a modular dataflow inference engine, capable of accelerating a wide range of DNNs. In this work, we present a 40nm preliminary version of this architecture with DIMC-SRAM tiles capable of in-memory binary computations to dramatically increase the computational efficiency of binary layers. We performed power/performance analysis to demonstrate the advantages of this paradigm, which in our experiments achieved a TOPS/W efficiency up to 40x higher than traditional NPU implementations. We have then extended the ST Neural compilation toolchain to automatically map binary and mixed-precision NNs on the NPU, applying high-level optimizations and binding the models’ binary GEMM and CONV layers to the DIMC tiles. The overall system was validated by developing three real-time applications that represent potential real-world power-constrained use-cases: Fan spinning anomaly detection, Keyword spotting and Face Presence Detection. The applications ran with a latency < 3 ms, and the DIMC subsystem achieved a peak efficiency > 100 TOPS/W for binary in-memory computations

Accelerating Binary and Mixed-Precision NNs Inference on STMicroelectronics Embedded NPU with Digital In-Memory-Computing

Fabrizio Indirli;David Siorpaes;Stefano Bosisio;Alessandro Vaghi;Giuseppe Desoli;Andrea Ornstein;Ivana Guarneri;Manav Chandna;Pratishtha Bhatia;Saumya Suneja;Surinder-pal Singh;Nitin Chawla;Manuj Ayodhyawasi;Cristina Silvano

2023-01-01

Abstract

The proliferation of embedded Neural Processing Units (NPUs) is enabling the adoption of Tiny Machine Learning for numerous cognitive computing applications on the edge, where maximizing energy efficiency is key. To overcome the limitations of traditional Von Neumann architectures, novel designs based on computational memories are arising. STMicroelectronics is developing an experimental low-power NPU that integrates Digital In-Memory Computing (DIMC) SRAM with a modular dataflow inference engine, capable of accelerating a wide range of DNNs. In this work, we present a 40nm preliminary version of this architecture with DIMC-SRAM tiles capable of in-memory binary computations to dramatically increase the computational efficiency of binary layers. We performed power/performance analysis to demonstrate the advantages of this paradigm, which in our experiments achieved a TOPS/W efficiency up to 40x higher than traditional NPU implementations. We have then extended the ST Neural compilation toolchain to automatically map binary and mixed-precision NNs on the NPU, applying high-level optimizations and binding the models’ binary GEMM and CONV layers to the DIMC tiles. The overall system was validated by developing three real-time applications that represent potential real-world power-constrained use-cases: Fan spinning anomaly detection, Keyword spotting and Face Presence Detection. The applications ran with a latency < 3 ms, and the DIMC subsystem achieved a peak efficiency > 100 TOPS/W for binary in-memory computations

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno di pubblicazione
	
				2023
			
	Titolo del libro
	
				embedded world Conference 2023 Proceedings
			
	ISBN (International Standard Book Number)
	
				978-3-645-50197-2
			
	Appare nelle tipologie:
	
				04.1 Contributo in Atti di convegno

File in questo prodotto:

File	Dimensione	Formato
ewc23_paper.pdf Accesso riservato : Publisher’s version Dimensione 5.32 MB Formato Adobe PDF Visualizza/Apri	5.32 MB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11311/1237616

Citazioni

ND

ND

ND

social impact