RE.PUBLIC@POLIMI pubblicazioni di ricerca del Politecnico di Milano

Iterative stencils represent the core computational kernel of many applications belonging to different domains, from scientific computing to finance. Given the complex dependencies and the low computation to memory access ratio, this kernels represent a challenging acceleration target on every architecture. This is especially true for FPGAs, whose direct hardware execution offers the possibility for high performance and power efficiency, but where the non-fixed architecture can lead to very large solutions spaces to be explored. In this work, we build upon an FPGA-based acceleration methodology for iterative stencil algorithms previously presented, where we provide a dataflow architectural template that implements optimal on-chip buffering and is able to increase almost linearly in performance using a scaling technique denoted as iterations queuing. In particular, we propose a set of design improvements and we elaborate an accurate analytical performance model that can be used to support the exploration of the design space. Experimental results obtained implementing a set of benchmarks from different application domains on a Xilinx VC707 board show an average performance and power efficiency increase over the previous work of respectively 22.8x and 9.9x, and a prediction error that is on average 0.5%.

An FPGA-based acceleration methodology and performance model for iterative stencils

Reggiani, Enrico;Natale, Giuseppe;Moroni, Carlo;Santambrogio, Marco D.

2018-01-01

Abstract

Iterative stencils represent the core computational kernel of many applications belonging to different domains, from scientific computing to finance. Given the complex dependencies and the low computation to memory access ratio, this kernels represent a challenging acceleration target on every architecture. This is especially true for FPGAs, whose direct hardware execution offers the possibility for high performance and power efficiency, but where the non-fixed architecture can lead to very large solutions spaces to be explored. In this work, we build upon an FPGA-based acceleration methodology for iterative stencil algorithms previously presented, where we provide a dataflow architectural template that implements optimal on-chip buffering and is able to increase almost linearly in performance using a scaling technique denoted as iterations queuing. In particular, we propose a set of design improvements and we elaborate an accurate analytical performance model that can be used to support the exploration of the design space. Experimental results obtained implementing a set of benchmarks from different application domains on a Xilinx VC707 board show an average performance and power efficiency increase over the previous work of respectively 22.8x and 9.9x, and a prediction error that is on average 0.5%.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno di pubblicazione
	
				2018
			
	Titolo del libro
	
				Proceedings - 2018 IEEE 32nd International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2018
			
	ISBN (International Standard Book Number)
	
				9781538655559
			
	Parole chiave
	
				Dataflow Architectures; FPGA; Iterative Stencil Algorithm; Performance Model; Artificial Intelligence; Computer Networks and Communications; Hardware and Architecture; Information Systems and Management
			
	Appare nelle tipologie:
	
				04.1 Contributo in Atti di convegno

File in questo prodotto:

File	Dimensione	Formato
PID5270707.pdf Accesso riservato : Pre-Print (o Pre-Refereeing) Dimensione 364.31 kB Formato Adobe PDF Visualizza/Apri	364.31 kB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11311/1063567

Citazioni

ND

4

4

social impact