RE.PUBLIC@POLIMI pubblicazioni di ricerca del Politecnico di Milano

In the past few years we have experienced an extremely rapid growth of modern applications based on deep learning algorithms such as Convolutional Neural Network (CNN), and consequently, an intensification of academic and industrial research focused on the optimization of their imple- mentation. Among the different alternatives that have been ex- plored, FPGAs seems to be one of the most attractive, as they are able to deliver high performance and energy-efficiency, thanks to their inherent parallelism and direct hardware execution, while retaining extreme flexibility due to their reconfigurability.In this paper we present a design methodology of a dataflow accelerator for the implementation of CNNs on FPGAs, that ensures scalability - and achieve a higher degree of parallelism as the size of the CNN increases - and an efficient exploitation of the available resources. Furthermore, we analyze resource consumption of the layers of the CNN as well as latency in relation to the implementation's hyperparameters. Finally, we show that the proposed design implements a high-level pipeline between the different network layers, and as a result, we can improve the latency to process an image by feeding the CNN with batches of multiple images.

On How to Design Dataflow FPGA-Based Accelerators for Convolutional Neural Networks

NATALE, GIUSEPPE;BACIS, MARCO;SANTAMBROGIO, MARCO DOMENICO

2017-01-01

Abstract

In the past few years we have experienced an extremely rapid growth of modern applications based on deep learning algorithms such as Convolutional Neural Network (CNN), and consequently, an intensification of academic and industrial research focused on the optimization of their imple- mentation. Among the different alternatives that have been ex- plored, FPGAs seems to be one of the most attractive, as they are able to deliver high performance and energy-efficiency, thanks to their inherent parallelism and direct hardware execution, while retaining extreme flexibility due to their reconfigurability.In this paper we present a design methodology of a dataflow accelerator for the implementation of CNNs on FPGAs, that ensures scalability - and achieve a higher degree of parallelism as the size of the CNN increases - and an efficient exploitation of the available resources. Furthermore, we analyze resource consumption of the layers of the CNN as well as latency in relation to the implementation's hyperparameters. Finally, we show that the proposed design implements a high-level pipeline between the different network layers, and as a result, we can improve the latency to process an image by feeding the CNN with batches of multiple images.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno di pubblicazione
	
				2017
			
	Titolo del libro
	
				Proceedings of IEEE Computer Society Annual Symposium on VLSI, ISVLSI
			
	ISBN (International Standard Book Number)
	
				9781509067626
			
	Parole chiave
	
				Convolutional Neural Networks; Dataflow Architectures; Field Programmable Gate Arrays; Hardware and Architecture; Control and Systems Engineering; Electrical and Electronic Engineering
			
	Appare nelle tipologie:
	
				04.1 Contributo in Atti di convegno

File in questo prodotto:

Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11311/1032009

Citazioni

ND

7

ND

social impact