Iterative stencils represent the core computational kernel of many applications belonging to different domains, from scientific computing to finance. Given the complex dependencies and the low computation to memory access ratio, this kernels represent a challenging acceleration target on every architecture. This is especially true for FPGAs, whose direct hardware execution offers the possibility for high performance and power efficiency, but where the non-fixed architecture can lead to very large solutions spaces to be explored. In this work, we build upon an FPGA-based acceleration methodology for iterative stencil algorithms previously presented, where we provide a dataflow architectural template that implements optimal on-chip buffering and is able to increase almost linearly in performance using a scaling technique denoted as iterations queuing. In particular, we propose a set of design improvements and we elaborate an accurate analytical performance model that can be used to support the exploration of the design space. Experimental results obtained implementing a set of benchmarks from different application domains on a Xilinx VC707 board show an average performance and power efficiency increase over the previous work of respectively 22.8x and 9.9x, and a prediction error that is on average 0.5%.
An FPGA-based acceleration methodology and performance model for iterative stencils
Reggiani, Enrico;Natale, Giuseppe;Santambrogio, Marco D.
2018-01-01
Abstract
Iterative stencils represent the core computational kernel of many applications belonging to different domains, from scientific computing to finance. Given the complex dependencies and the low computation to memory access ratio, this kernels represent a challenging acceleration target on every architecture. This is especially true for FPGAs, whose direct hardware execution offers the possibility for high performance and power efficiency, but where the non-fixed architecture can lead to very large solutions spaces to be explored. In this work, we build upon an FPGA-based acceleration methodology for iterative stencil algorithms previously presented, where we provide a dataflow architectural template that implements optimal on-chip buffering and is able to increase almost linearly in performance using a scaling technique denoted as iterations queuing. In particular, we propose a set of design improvements and we elaborate an accurate analytical performance model that can be used to support the exploration of the design space. Experimental results obtained implementing a set of benchmarks from different application domains on a Xilinx VC707 board show an average performance and power efficiency increase over the previous work of respectively 22.8x and 9.9x, and a prediction error that is on average 0.5%.File | Dimensione | Formato | |
---|---|---|---|
PID5270707.pdf
Accesso riservato
:
Pre-Print (o Pre-Refereeing)
Dimensione
364.31 kB
Formato
Adobe PDF
|
364.31 kB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.