The recent emergence of large-scale knowledge discovery, data mining and social network analysis, irregular applications have gained renewed interest. Cache-based architectures do not provide optimal performances with such workloads, mainly due to the low spatial and temporal locality of their control and memory access patterns. This paper presents a multi-node, multi-core, multi-threaded shared-memory system architecture designed for the execution of large-scale irregular applications, and built on top of three pillars that support these workloads. First, transparent hardware support for Partitioned Global Address Space (PGAS) provides a large globally-shared address space with no software library overhead. Second, multithreaded multi-core processing nodes achieve the necessary latency tolerance required when accessing physically distributed global memory. Third, hardware support is provided for inter-thread synchronization on the global address space. An analytical performance model that accounts for the main architecture and application characteristics is presented. The hardware design of the proposed custom architectural building blocks is then described. Finally, a multi-board FPGA prototype of the proposed system with typical irregular kernels and benchmarks is presented. The experimental evaluation demonstrates the architecture performance scalability for different configurations of the whole system.

Exploring hardware support for scaling irregular applications on multi-node multi-core architectures2013 IEEE 24th International Conference on Application-Specific Systems, Architectures and Processors

PALERMO, GIANLUCA;
2013-01-01

Abstract

The recent emergence of large-scale knowledge discovery, data mining and social network analysis, irregular applications have gained renewed interest. Cache-based architectures do not provide optimal performances with such workloads, mainly due to the low spatial and temporal locality of their control and memory access patterns. This paper presents a multi-node, multi-core, multi-threaded shared-memory system architecture designed for the execution of large-scale irregular applications, and built on top of three pillars that support these workloads. First, transparent hardware support for Partitioned Global Address Space (PGAS) provides a large globally-shared address space with no software library overhead. Second, multithreaded multi-core processing nodes achieve the necessary latency tolerance required when accessing physically distributed global memory. Third, hardware support is provided for inter-thread synchronization on the global address space. An analytical performance model that accounts for the main architecture and application characteristics is presented. The hardware design of the proposed custom architectural building blocks is then described. Finally, a multi-board FPGA prototype of the proposed system with typical irregular kernels and benchmarks is presented. The experimental evaluation demonstrates the architecture performance scalability for different configurations of the whole system.
2013
2013 IEEE 24th International Conference on Application-Specific Systems, Architectures and Processors
9781479904938
9781479904945
irregular applications; multi-core architectures; Partitioned Global Address Space (PGAS); HPC
File in questo prodotto:
File Dimensione Formato  
ASAP13.pdf

Accesso riservato

: Post-Print (DRAFT o Author’s Accepted Manuscript-AAM)
Dimensione 4.21 MB
Formato Adobe PDF
4.21 MB Adobe PDF   Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11311/823934
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 1
  • ???jsp.display-item.citation.isi??? ND
social impact