RE.PUBLIC@POLIMI pubblicazioni di ricerca del Politecnico di Milano

The acceleration of Convolutional Neural Networks (CNNs) on FPGAs is becoming increasingly popular for computer vision tasks. However, the limited memory and bandwidth of these devices pose some challenges to the design of conventional CNN accelerators, which use external DRAM to store the intermediate results of each layer. To mitigate these criticalities, researchers have proposed the fused-layer methodology, which diminishes the accesses to the external DRAM by accelerating simultaneously multiple subsequent layers on the same chip. In this work, we propose a configurable fused-layer accelerator that exploits output tiling and the half-precision float datatype to reduce resource utilization. We assessed its effectiveness with experiments on VGG-16 and Yolo-Lite CNNs, targeting a Xilinx Zynq ZU6EG FPGA. Our design achieved up to 42% speedup and up to 95% fewer transfers from external memory compared to a single-layer baseline solution. Moreover, to ease and quicken the design space exploration, we developed a Machine Learning model that predicts the performance and the resource utilization of our accelerator with an accuracy > 90% on the reported dataset.

A Tile-based Fused-layer CNN Accelerator for FPGAs

Indirli, Fabrizio;Erdem, Ahmet;Silvano, Cristina

2020-01-01

Abstract

The acceleration of Convolutional Neural Networks (CNNs) on FPGAs is becoming increasingly popular for computer vision tasks. However, the limited memory and bandwidth of these devices pose some challenges to the design of conventional CNN accelerators, which use external DRAM to store the intermediate results of each layer. To mitigate these criticalities, researchers have proposed the fused-layer methodology, which diminishes the accesses to the external DRAM by accelerating simultaneously multiple subsequent layers on the same chip. In this work, we propose a configurable fused-layer accelerator that exploits output tiling and the half-precision float datatype to reduce resource utilization. We assessed its effectiveness with experiments on VGG-16 and Yolo-Lite CNNs, targeting a Xilinx Zynq ZU6EG FPGA. Our design achieved up to 42% speedup and up to 95% fewer transfers from external memory compared to a single-layer baseline solution. Moreover, to ease and quicken the design space exploration, we developed a Machine Learning model that predicts the performance and the resource utilization of our accelerator with an accuracy > 90% on the reported dataset.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno di pubblicazione
	
				2020
			
	Titolo del libro
	
				Proceedings of the 2020 27th IEEE International Conference on Electronics, Circuits and Systems (ICECS)
			
	ISBN (International Standard Book Number)
	
				978-1-7281-6044-3
			
	Parole chiave
	
				Hardware accelerators for Neural Networks
FPGA
			
	Appare nelle tipologie:
	
				04.1 Contributo in Atti di convegno

File in questo prodotto:

File	Dimensione	Formato
Tiled-based_ICECS_2020_09294981.pdf Accesso riservato Descrizione: Articolo pubblicato : Publisher’s version Dimensione 784.33 kB Formato Adobe PDF Visualizza/Apri	784.33 kB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11311/1158672

Citazioni

ND

3

3

social impact