RE.PUBLIC@POLIMI pubblicazioni di ricerca del Politecnico di Milano

Domain-specific systems improve the performance of specific applications compared to general-purpose processing systems by deploying custom hardware accelerators. These hardware accelerators are generated using high-level synthesis (HLS) tools. The HLS tools enable a comprehensive design space exploration, optimizing the accelerators' compute performance. However, they often ignore the challenges of implementing the accelerators in a system-on-chip, particularly how they access memory. Our work introduces a buffering system design that improves accelerators' memory accesses by intelligently employing burst transactions to prefetch useful data from external memory to on-chip local buffers. Our design is dynamic, parametric, and transparent to the accelerators generated by HLS tools. We derive the buffering system parameters using appropriate compiler-based analysis passes and memory channel latency constraints. The proposed buffering system design results in, on average, 8.8× performance improvements while lowering memory channel utilization by 53.2% for a set of PolyBench kernels.

A Synthesis Methodology for Intelligent Memory Interfaces in Accelerator Systems

Limaye, Ankur;Bohm Agostini, Nicolas;Barone, Claudio;Castellana, Vito Giovanni;Fiorito, Michele;Ferrandi, Fabrizio;Marquez, Andres;Tumeo, Antonino

2025-01-01

Abstract

Domain-specific systems improve the performance of specific applications compared to general-purpose processing systems by deploying custom hardware accelerators. These hardware accelerators are generated using high-level synthesis (HLS) tools. The HLS tools enable a comprehensive design space exploration, optimizing the accelerators' compute performance. However, they often ignore the challenges of implementing the accelerators in a system-on-chip, particularly how they access memory. Our work introduces a buffering system design that improves accelerators' memory accesses by intelligently employing burst transactions to prefetch useful data from external memory to on-chip local buffers. Our design is dynamic, parametric, and transparent to the accelerators generated by HLS tools. We derive the buffering system parameters using appropriate compiler-based analysis passes and memory channel latency constraints. The proposed buffering system design results in, on average, 8.8× performance improvements while lowering memory channel utilization by 53.2% for a set of PolyBench kernels.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno di pubblicazione
	
				2025
			
	Titolo del libro
	
				Proceedings of the 30th Asia and South Pacific Design Automation Conference
			
	Appare nelle tipologie:
	
				04.1 Contributo in Atti di convegno

File in questo prodotto:

Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11311/1285593

Citazioni

ND

0

0

social impact