In recent years, hardware specialization has become the dominant strategy for accelerating the performance of emerging applications. However, designing and integrating custom hardware is a complex, time-consuming, and costly process. High-Level Synthesis (HLS) tools mitigate these issues by designing, prototyping, and deploying hardware accelerators on reconfigurable devices (e.g., FPGAs) attached to the computing systems. The conventional HLS tools aim at exploiting instruction level parallelism to achieve superior performance and generate accelerators with fine-grained memory accesses. The limited on-device memory on FPGA necessitates frequent direct access to external memories to satisfy the data requirements of these applications. Thus, improving memory bandwidth utilization is challenging for the HLS flows, and requires rethinking of the accelerators’ memory subsystem. In this paper, we present a synthesis methodology that seamlessly introduces custom caches into the HLS-generated designs. The key feature of our approach is the ability to fine-tune cache parameters specific to the accelerator. This is crucial since different workloads exhibit varying spatial and temporal locality, leading to different performance trade-offs. Our methodology also allows for efficient design space exploration without requiring user modifications to the input specification. We demonstrate the practical application of our methodology by studying the effects of various cache configurations on selected kernels’ performance. Based on our study, we also present a heuristic that guides users to fine-tune cache parameters. Our study shows that fine-tuning caches can improve performance by up to 3.5× with 13% resource overhead, compared to the conventional non-cached designs.

To Cache or not to Cache? Exploring the Design Space of Tunable, HLS-generated Accelerators

Gozzi, Giovanni;Fiorito, Michele;Ferrandi, Fabrizio;
2024-01-01

Abstract

In recent years, hardware specialization has become the dominant strategy for accelerating the performance of emerging applications. However, designing and integrating custom hardware is a complex, time-consuming, and costly process. High-Level Synthesis (HLS) tools mitigate these issues by designing, prototyping, and deploying hardware accelerators on reconfigurable devices (e.g., FPGAs) attached to the computing systems. The conventional HLS tools aim at exploiting instruction level parallelism to achieve superior performance and generate accelerators with fine-grained memory accesses. The limited on-device memory on FPGA necessitates frequent direct access to external memories to satisfy the data requirements of these applications. Thus, improving memory bandwidth utilization is challenging for the HLS flows, and requires rethinking of the accelerators’ memory subsystem. In this paper, we present a synthesis methodology that seamlessly introduces custom caches into the HLS-generated designs. The key feature of our approach is the ability to fine-tune cache parameters specific to the accelerator. This is crucial since different workloads exhibit varying spatial and temporal locality, leading to different performance trade-offs. Our methodology also allows for efficient design space exploration without requiring user modifications to the input specification. We demonstrate the practical application of our methodology by studying the effects of various cache configurations on selected kernels’ performance. Based on our study, we also present a heuristic that guides users to fine-tune cache parameters. Our study shows that fine-tuning caches can improve performance by up to 3.5× with 13% resource overhead, compared to the conventional non-cached designs.
2024
ACM International Conference Proceeding Series
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11311/1285590
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact