In the past few years we have experienced an extremely rapid growth of modern applications based on deep learning algorithms such as Convolutional Neural Network (CNN), and consequently, an intensification of academic and industrial research focused on the optimization of their imple- mentation. Among the different alternatives that have been ex- plored, FPGAs seems to be one of the most attractive, as they are able to deliver high performance and energy-efficiency, thanks to their inherent parallelism and direct hardware execution, while retaining extreme flexibility due to their reconfigurability.In this paper we present a design methodology of a dataflow accelerator for the implementation of CNNs on FPGAs, that ensures scalability - and achieve a higher degree of parallelism as the size of the CNN increases - and an efficient exploitation of the available resources. Furthermore, we analyze resource consumption of the layers of the CNN as well as latency in relation to the implementation's hyperparameters. Finally, we show that the proposed design implements a high-level pipeline between the different network layers, and as a result, we can improve the latency to process an image by feeding the CNN with batches of multiple images.

On How to Design Dataflow FPGA-Based Accelerators for Convolutional Neural Networks

NATALE, GIUSEPPE;BACIS, MARCO;SANTAMBROGIO, MARCO DOMENICO
2017-01-01

Abstract

In the past few years we have experienced an extremely rapid growth of modern applications based on deep learning algorithms such as Convolutional Neural Network (CNN), and consequently, an intensification of academic and industrial research focused on the optimization of their imple- mentation. Among the different alternatives that have been ex- plored, FPGAs seems to be one of the most attractive, as they are able to deliver high performance and energy-efficiency, thanks to their inherent parallelism and direct hardware execution, while retaining extreme flexibility due to their reconfigurability.In this paper we present a design methodology of a dataflow accelerator for the implementation of CNNs on FPGAs, that ensures scalability - and achieve a higher degree of parallelism as the size of the CNN increases - and an efficient exploitation of the available resources. Furthermore, we analyze resource consumption of the layers of the CNN as well as latency in relation to the implementation's hyperparameters. Finally, we show that the proposed design implements a high-level pipeline between the different network layers, and as a result, we can improve the latency to process an image by feeding the CNN with batches of multiple images.
2017
Proceedings of IEEE Computer Society Annual Symposium on VLSI, ISVLSI
9781509067626
Convolutional Neural Networks; Dataflow Architectures; Field Programmable Gate Arrays; Hardware and Architecture; Control and Systems Engineering; Electrical and Electronic Engineering
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11311/1032009
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 7
  • ???jsp.display-item.citation.isi??? ND
social impact