Accelerators, including graphic processing units (GPUs) for general-purpose computation, manycore designs with wide vector units (e.g., Intel Phi), have become a common component of many high-performance clusters. The appearance of more stable and reliable tools that can automatically convert code written in high-level specifications with annotations (such as C or C++) to hardware description languages (high-level synthesis-HLS) is also setting the stage for a broader use of reconfigurable devices (e.g., field programmable gate arrays-FPGAs) in high-performance system for the implementation of custom accelerators, helped by the fact that new processors include advanced cache-coherent interconnects for these components. In this chapter, we briefly survey the status of the use of accelerators in high-performance systems targeted at big data analytics applications. Although the recent progress in the use of accelerators for this class of applications has been significant, we argue that, differently from scientific simulations, there are still gaps to close. This is particularly true for the “irregular” behaviors exhibited by emerging no-SQL and graph databases. We focus our attention on the limits of HLS tools for data analytics and graph methods, and discuss a new architectural template that better fits the requirement of this class of applications. We validate the new architectural templates by modifying the Graph Engine for Multithreaded System (GEMS) framework to support accelerators generated with such a methodology, and by testing it with queries coming from the Lehigh University Benchmark (LUBM). The architectural template enables better supporting the task- and memory-level parallelism present in graph methods by supporting a new control model and an enhanced memory interface. We show that our solution allows generating parallel accelerators, providing speed ups with respect to conventional HLS flows. We finally draw conclusions and present a perspective on the use of reconfigurable devices and design automation tools for data analytics.
Considerations on the use of custom accelerators for big data analytics
Castellana, Vito Giovanni;Tumeo, Antonino;Minutoli, Marco;Lattuada, Marco;Ferrandi, Fabrizio
2017-01-01
Abstract
Accelerators, including graphic processing units (GPUs) for general-purpose computation, manycore designs with wide vector units (e.g., Intel Phi), have become a common component of many high-performance clusters. The appearance of more stable and reliable tools that can automatically convert code written in high-level specifications with annotations (such as C or C++) to hardware description languages (high-level synthesis-HLS) is also setting the stage for a broader use of reconfigurable devices (e.g., field programmable gate arrays-FPGAs) in high-performance system for the implementation of custom accelerators, helped by the fact that new processors include advanced cache-coherent interconnects for these components. In this chapter, we briefly survey the status of the use of accelerators in high-performance systems targeted at big data analytics applications. Although the recent progress in the use of accelerators for this class of applications has been significant, we argue that, differently from scientific simulations, there are still gaps to close. This is particularly true for the “irregular” behaviors exhibited by emerging no-SQL and graph databases. We focus our attention on the limits of HLS tools for data analytics and graph methods, and discuss a new architectural template that better fits the requirement of this class of applications. We validate the new architectural templates by modifying the Graph Engine for Multithreaded System (GEMS) framework to support accelerators generated with such a methodology, and by testing it with queries coming from the Lehigh University Benchmark (LUBM). The architectural template enables better supporting the task- and memory-level parallelism present in graph methods by supporting a new control model and an enhanced memory interface. We show that our solution allows generating parallel accelerators, providing speed ups with respect to conventional HLS flows. We finally draw conclusions and present a perspective on the use of reconfigurable devices and design automation tools for data analytics.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.