The acceleration of Convolutional Neural Networks (CNNs) on FPGAs is becoming increasingly popular for computer vision tasks. However, the limited memory and bandwidth of these devices pose some challenges to the design of conventional CNN accelerators, which use external DRAM to store the intermediate results of each layer. To mitigate these criticalities, researchers have proposed the fused-layer methodology, which diminishes the accesses to the external DRAM by accelerating simultaneously multiple subsequent layers on the same chip. In this work, we propose a configurable fused-layer accelerator that exploits output tiling and the half-precision float datatype to reduce resource utilization. We assessed its effectiveness with experiments on VGG-16 and Yolo-Lite CNNs, targeting a Xilinx Zynq ZU6EG FPGA. Our design achieved up to 42% speedup and up to 95% fewer transfers from external memory compared to a single-layer baseline solution. Moreover, to ease and quicken the design space exploration, we developed a Machine Learning model that predicts the performance and the resource utilization of our accelerator with an accuracy > 90% on the reported dataset.

A Tile-based Fused-layer CNN Accelerator for FPGAs

Indirli, Fabrizio;Erdem, Ahmet;Silvano, Cristina
2020-01-01

Abstract

The acceleration of Convolutional Neural Networks (CNNs) on FPGAs is becoming increasingly popular for computer vision tasks. However, the limited memory and bandwidth of these devices pose some challenges to the design of conventional CNN accelerators, which use external DRAM to store the intermediate results of each layer. To mitigate these criticalities, researchers have proposed the fused-layer methodology, which diminishes the accesses to the external DRAM by accelerating simultaneously multiple subsequent layers on the same chip. In this work, we propose a configurable fused-layer accelerator that exploits output tiling and the half-precision float datatype to reduce resource utilization. We assessed its effectiveness with experiments on VGG-16 and Yolo-Lite CNNs, targeting a Xilinx Zynq ZU6EG FPGA. Our design achieved up to 42% speedup and up to 95% fewer transfers from external memory compared to a single-layer baseline solution. Moreover, to ease and quicken the design space exploration, we developed a Machine Learning model that predicts the performance and the resource utilization of our accelerator with an accuracy > 90% on the reported dataset.
2020
Proceedings of the 2020 27th IEEE International Conference on Electronics, Circuits and Systems (ICECS)
978-1-7281-6044-3
Hardware accelerators for Neural Networks
FPGA
File in questo prodotto:
File Dimensione Formato  
Tiled-based_ICECS_2020_09294981.pdf

Accesso riservato

Descrizione: Articolo pubblicato
: Publisher’s version
Dimensione 784.33 kB
Formato Adobe PDF
784.33 kB Adobe PDF   Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11311/1158672
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 3
  • ???jsp.display-item.citation.isi??? 2
social impact