Iterative stencils represent the core computational kernel of many applications belonging to different domains, from scientific computing to finance. Given the complex dependencies and the low computation to memory access ratio, this kernels represent a challenging acceleration target on every architecture. This is especially true for FPGAs, whose direct hardware execution offers the possibility for high performance and power efficiency, but where the non-fixed architecture can lead to very large solutions spaces to be explored. In this work, we build upon an FPGA-based acceleration methodology for iterative stencil algorithms previously presented, where we provide a dataflow architectural template that implements optimal on-chip buffering and is able to increase almost linearly in performance using a scaling technique denoted as iterations queuing. In particular, we propose a set of design improvements and we elaborate an accurate analytical performance model that can be used to support the exploration of the design space. Experimental results obtained implementing a set of benchmarks from different application domains on a Xilinx VC707 board show an average performance and power efficiency increase over the previous work of respectively 22.8x and 9.9x, and a prediction error that is on average 0.5%.

An FPGA-based acceleration methodology and performance model for iterative stencils

Reggiani, Enrico;Natale, Giuseppe;Santambrogio, Marco D.
2018-01-01

Abstract

Iterative stencils represent the core computational kernel of many applications belonging to different domains, from scientific computing to finance. Given the complex dependencies and the low computation to memory access ratio, this kernels represent a challenging acceleration target on every architecture. This is especially true for FPGAs, whose direct hardware execution offers the possibility for high performance and power efficiency, but where the non-fixed architecture can lead to very large solutions spaces to be explored. In this work, we build upon an FPGA-based acceleration methodology for iterative stencil algorithms previously presented, where we provide a dataflow architectural template that implements optimal on-chip buffering and is able to increase almost linearly in performance using a scaling technique denoted as iterations queuing. In particular, we propose a set of design improvements and we elaborate an accurate analytical performance model that can be used to support the exploration of the design space. Experimental results obtained implementing a set of benchmarks from different application domains on a Xilinx VC707 board show an average performance and power efficiency increase over the previous work of respectively 22.8x and 9.9x, and a prediction error that is on average 0.5%.
2018
Proceedings - 2018 IEEE 32nd International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2018
9781538655559
Dataflow Architectures; FPGA; Iterative Stencil Algorithm; Performance Model; Artificial Intelligence; Computer Networks and Communications; Hardware and Architecture; Information Systems and Management
File in questo prodotto:
File Dimensione Formato  
PID5270707.pdf

Accesso riservato

: Pre-Print (o Pre-Refereeing)
Dimensione 364.31 kB
Formato Adobe PDF
364.31 kB Adobe PDF   Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11311/1063567
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 4
  • ???jsp.display-item.citation.isi??? 4
social impact