The intrinsic complexity of modern computing systems requires structured methods for analyzing and optimizing application performance. In this context, the Roofline model proposes an intuitive and visual method providing performance insight and optimization guidance for a given architecture. Although this methodology successfully models multicore and GPU performance optimizations, the original formulation does not directly apply to FPGA devices. For this reason, we propose a Roofline model analysis for reconfigurable architectures and an associated CAD tool for assisting HLS optimization of C/C++ applications. We firstly model FPGA attainable performance by means of an analytical method. Then, we integrate locality walls and a DSE engine for an enhanced optimization process. Starting from a software version of the N-body algorithm, we firstly illustrate how our methodology helps at quickly achieving performance comparable to a state-of-the-art FPGA bespoke implementation. Then, we illustrate an assisted platform porting of the Smith-Waterman sequence alignment providing a 9x speedup. Finally, we evaluated the single DSE engine on the Poly-Bench test suite and achieved performance improvements up to 14.36x compared to previous automated solutions in the literature.

A CAD-based methodology to optimize HLS code via the Roofline model

Emanuele Del Sozzo;Lorenzo Di Tucci;Marco D. Santambrogio
2020-01-01

Abstract

The intrinsic complexity of modern computing systems requires structured methods for analyzing and optimizing application performance. In this context, the Roofline model proposes an intuitive and visual method providing performance insight and optimization guidance for a given architecture. Although this methodology successfully models multicore and GPU performance optimizations, the original formulation does not directly apply to FPGA devices. For this reason, we propose a Roofline model analysis for reconfigurable architectures and an associated CAD tool for assisting HLS optimization of C/C++ applications. We firstly model FPGA attainable performance by means of an analytical method. Then, we integrate locality walls and a DSE engine for an enhanced optimization process. Starting from a software version of the N-body algorithm, we firstly illustrate how our methodology helps at quickly achieving performance comparable to a state-of-the-art FPGA bespoke implementation. Then, we illustrate an assisted platform porting of the Smith-Waterman sequence alignment providing a 9x speedup. Finally, we evaluated the single DSE engine on the Poly-Bench test suite and achieved performance improvements up to 14.36x compared to previous automated solutions in the literature.
2020
ICCAD '20: Proceedings of the 39th International Conference on Computer-Aided Design
978-1-6654-2324-3
File in questo prodotto:
File Dimensione Formato  
3400302.3415730.pdf

Accesso riservato

: Publisher’s version
Dimensione 570.3 kB
Formato Adobe PDF
570.3 kB Adobe PDF   Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11311/1169314
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 16
  • ???jsp.display-item.citation.isi??? 15
social impact