Graph analytics are an emerging class of irregular applications. Operating on very large datasets, they present unique behaviors, such as fine-grained, unpredictable memory accesses, and highly unbalanced task-level parallelism, that make existing general-purpose processors or accelerators (e.g., GPUs) suboptimal or difficult to program. To address these issues, research and industry are more and more relying on designs based on reconfigurable devices (Field Programmable Gate Arrays), sometimes even partially employing High-Level Synthesis (HLS) methods to accelerate the development of the accelerators. In this paper, we propose a novel architecture template for the automatic generation of accelerators for graph analytics and irregular applications. The architecture template includes a dynamic task scheduler, a parallel array of accelerators that enables supporting task-level parallelism with context switching, and a related multi-channel memory interface that decouples communication from computation and provides support for fine-grained atomic memory operations. We discuss the integration of the architectural template in an HLS flow, presenting the necessary modifications to enable automatic generation of the accelerators starting from OpenMP annotated code. We evaluate our approach by synthesizing custom designs for a set of graph database benchmark queries. We compare the synthesized accelerators with previous state-of-the-art methodologies for the synthesis of parallel architectures.
Svelto: High-Level Synthesis of Multi-Threaded Accelerators for Graph Analytics
Ferrandi, Fabrizio
2021-01-01
Abstract
Graph analytics are an emerging class of irregular applications. Operating on very large datasets, they present unique behaviors, such as fine-grained, unpredictable memory accesses, and highly unbalanced task-level parallelism, that make existing general-purpose processors or accelerators (e.g., GPUs) suboptimal or difficult to program. To address these issues, research and industry are more and more relying on designs based on reconfigurable devices (Field Programmable Gate Arrays), sometimes even partially employing High-Level Synthesis (HLS) methods to accelerate the development of the accelerators. In this paper, we propose a novel architecture template for the automatic generation of accelerators for graph analytics and irregular applications. The architecture template includes a dynamic task scheduler, a parallel array of accelerators that enables supporting task-level parallelism with context switching, and a related multi-channel memory interface that decouples communication from computation and provides support for fine-grained atomic memory operations. We discuss the integration of the architectural template in an HLS flow, presenting the necessary modifications to enable automatic generation of the accelerators starting from OpenMP annotated code. We evaluate our approach by synthesizing custom designs for a set of graph database benchmark queries. We compare the synthesized accelerators with previous state-of-the-art methodologies for the synthesis of parallel architectures.File | Dimensione | Formato | |
---|---|---|---|
tc_svelto.pdf
accesso aperto
:
Post-Print (DRAFT o Author’s Accepted Manuscript-AAM)
Dimensione
4.46 MB
Formato
Adobe PDF
|
4.46 MB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.