With reconfigurable fabrics delivering increasing performance over the years, Field-Programmable Gate Arrays (FPGAs) are becoming an appealing solution for next-generation High-Performance Computing (HPC) systems. However, in order to gain traction among traditional Von Neumann architectures, the optimization process of FPGA designs should be further abstracted to a higher level. In fact, while High-Level Synthesis (HLS) already provides a handy way to write FPGA code with procedural languages, substantial effort and expertise are still required to optimize the resulting FPGA design for the underlying hardware. To overcome this problem, we propose a semi-automated performance optimization methodology based on a Hierarchical Roofline model for FPGAs. System-wide and applications-specific optimizations such as off-chip memory transfer and data locality optimizations are guided by the FPGA Roofline model whereas FPGA-specific optimizations are automatically searched by a Design Space Exploration (DSE) engine. We demonstrate how this methodology allows to easily analyze and optimize a wide set of applications ranging from particle methods, wavefront algorithms, and sparse arithmetic computations. In addition, we illustrate how the integrated DSE engine achieves a 14.36x maximum speedup if compared to previous automated solutions in the literature.

A Comprehensive Methodology to Optimize FPGA Designs via the Roofline Model

Del Sozzo E.;Rabozzi M.;Di Tucci L.;Sciuto D.;Santambrogio M. D.
2021-01-01

Abstract

With reconfigurable fabrics delivering increasing performance over the years, Field-Programmable Gate Arrays (FPGAs) are becoming an appealing solution for next-generation High-Performance Computing (HPC) systems. However, in order to gain traction among traditional Von Neumann architectures, the optimization process of FPGA designs should be further abstracted to a higher level. In fact, while High-Level Synthesis (HLS) already provides a handy way to write FPGA code with procedural languages, substantial effort and expertise are still required to optimize the resulting FPGA design for the underlying hardware. To overcome this problem, we propose a semi-automated performance optimization methodology based on a Hierarchical Roofline model for FPGAs. System-wide and applications-specific optimizations such as off-chip memory transfer and data locality optimizations are guided by the FPGA Roofline model whereas FPGA-specific optimizations are automatically searched by a Design Space Exploration (DSE) engine. We demonstrate how this methodology allows to easily analyze and optimize a wide set of applications ranging from particle methods, wavefront algorithms, and sparse arithmetic computations. In addition, we illustrate how the integrated DSE engine achieves a 14.36x maximum speedup if compared to previous automated solutions in the literature.
Analytical models
Computational modeling
Data models
Engines
Estimation
Field programmable gate arrays
FPGA
Hardware Accelerator Design
High-Performance Computing
Optimization
Roofline performance model
File in questo prodotto:
File Dimensione Formato  
A_Comprehensive_Methodology_to_Optimize_FPGA_Designs_via_the_Roofline_Model.pdf

accesso aperto

: Post-Print (DRAFT o Author’s Accepted Manuscript-AAM)
Dimensione 4.54 MB
Formato Adobe PDF
4.54 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11311/1207688
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 2
  • ???jsp.display-item.citation.isi??? ND
social impact