General Matrix Multiplications (GEMMs) are fundamental kernels in tensor-based scientific applications and deep learning. Modern AI accelerators using spatial architectures can run these kernels efficiently by leveraging parallelism and data reuse, but they require specific mappings to plan data movements and computations. The choice of mapping significantly impacts energy consumption and latency. Consequently, the vast space of possible mappings, unique for each GEMM-architecture pair, must be searched thoroughly to find optimal solutions. This is a complex optimization problem that imposes effective map-space exploration strategies. Current state-of-the-art mapping tools primarily address convolution kernels, a superset of GEMMs, but often fail to leverage GEMMs' specific characteristics. As a result, they struggle to consistently generate optimal mappings in a reasonable time for all GEMM-architecture pairs. This paper introduces FactorFlow, an automatic framework designed to map GEMM kernels to spatial architectures using adaptive programming and greedy optimization to minimize the energy-delay product. Our evaluation, conducted against four other state-of-the-art mapping tools on a selected set of GEMMs and architectures, demonstrates that FactorFlow consistently discovers mappings that outperform existing tools in terms of EDP while significantly reducing the exploration execution time.
FactorFlow: Mapping GEMMs on Spatial Architectures through Adaptive Programming and Greedy Optimization
Ronzani, Marco;Silvano, Cristina
2025-01-01
Abstract
General Matrix Multiplications (GEMMs) are fundamental kernels in tensor-based scientific applications and deep learning. Modern AI accelerators using spatial architectures can run these kernels efficiently by leveraging parallelism and data reuse, but they require specific mappings to plan data movements and computations. The choice of mapping significantly impacts energy consumption and latency. Consequently, the vast space of possible mappings, unique for each GEMM-architecture pair, must be searched thoroughly to find optimal solutions. This is a complex optimization problem that imposes effective map-space exploration strategies. Current state-of-the-art mapping tools primarily address convolution kernels, a superset of GEMMs, but often fail to leverage GEMMs' specific characteristics. As a result, they struggle to consistently generate optimal mappings in a reasonable time for all GEMM-architecture pairs. This paper introduces FactorFlow, an automatic framework designed to map GEMM kernels to spatial architectures using adaptive programming and greedy optimization to minimize the energy-delay product. Our evaluation, conducted against four other state-of-the-art mapping tools on a selected set of GEMMs and architectures, demonstrates that FactorFlow consistently discovers mappings that outperform existing tools in terms of EDP while significantly reducing the exploration execution time.| File | Dimensione | Formato | |
|---|---|---|---|
|
ASP_DAC2025_3658617.3697670.pdf
accesso aperto
:
Publisher’s version
Dimensione
2.82 MB
Formato
Adobe PDF
|
2.82 MB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


