Graph analytics are an emerging class of irregular applications. Operating on very large datasets, they present unique behaviors, such as fine-grained, unpredictable memory accesses, and highly unbalanced task-level parallelism, that make existing general-purpose processors or accelerators (e.g., GPUs) suboptimal or difficult to program. To address these issues, research and industry are more and more relying on designs based on reconfigurable devices (Field Programmable Gate Arrays), sometimes even partially employing High-Level Synthesis (HLS) methods to accelerate the development of the accelerators. In this paper, we propose a novel architecture template for the automatic generation of accelerators for graph analytics and irregular applications. The architecture template includes a dynamic task scheduler, a parallel array of accelerators that enables supporting task-level parallelism with context switching, and a related multi-channel memory interface that decouples communication from computation and provides support for fine-grained atomic memory operations. We discuss the integration of the architectural template in an HLS flow, presenting the necessary modifications to enable automatic generation of the accelerators starting from OpenMP annotated code. We evaluate our approach by synthesizing custom designs for a set of graph database benchmark queries. We compare the synthesized accelerators with previous state-of-the-art methodologies for the synthesis of parallel architectures.

Svelto: High-Level Synthesis of Multi-Threaded Accelerators for Graph Analytics

Ferrandi, Fabrizio
2021-01-01

Abstract

Graph analytics are an emerging class of irregular applications. Operating on very large datasets, they present unique behaviors, such as fine-grained, unpredictable memory accesses, and highly unbalanced task-level parallelism, that make existing general-purpose processors or accelerators (e.g., GPUs) suboptimal or difficult to program. To address these issues, research and industry are more and more relying on designs based on reconfigurable devices (Field Programmable Gate Arrays), sometimes even partially employing High-Level Synthesis (HLS) methods to accelerate the development of the accelerators. In this paper, we propose a novel architecture template for the automatic generation of accelerators for graph analytics and irregular applications. The architecture template includes a dynamic task scheduler, a parallel array of accelerators that enables supporting task-level parallelism with context switching, and a related multi-channel memory interface that decouples communication from computation and provides support for fine-grained atomic memory operations. We discuss the integration of the architectural template in an HLS flow, presenting the necessary modifications to enable automatic generation of the accelerators starting from OpenMP annotated code. We evaluate our approach by synthesizing custom designs for a set of graph database benchmark queries. We compare the synthesized accelerators with previous state-of-the-art methodologies for the synthesis of parallel architectures.
2021
Task Analysis, Parallel Processing, Computer Architecture, Dynamic Scheduling, Hardware, Field Programmable Gate Arrays, Memory Management, Parallel Architectures, Multi Threading, Dynamic Task Scheduling, Context Switching, RDF, SPARQL, High Performance Data Analytics, Big Data
File in questo prodotto:
File Dimensione Formato  
tc_svelto.pdf

accesso aperto

: Post-Print (DRAFT o Author’s Accepted Manuscript-AAM)
Dimensione 4.46 MB
Formato Adobe PDF
4.46 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11311/1161042
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 11
  • ???jsp.display-item.citation.isi??? 7
social impact