With computing systems becoming ubiquitous, numerous data sets of extremely large size are becoming available for analysis. Often the data collected have complex, graph based structures, which makes them difficult to process with traditional tools. Moreover, the irregularities in the data sets, and in the analysis algorithms, hamper the scaling of performance in large distributed highperformance systems, optimized for locality exploitation and regular data structures. In this paper we present an approach to system design that enable efficient execution of applications with irregular memory patterns on a distribute, many-core architecture, based on off-the-shelf cores. We introduce a set of hardware and software components, which provide a distributed global address space, fine-grained synchronization and transparently hide the latencies of remote accesses with multithreading. An FPGA prototype has been implemented to explore the design with a set of typical irregular kernels. We finally present an analytical model that highlights the benefits of the approach and help identifying the bottlenecks in the prototypes. The experimental evaluation on graph based applications demonstrates the scalability of the architecture for different configurations of the whole system.

Exploring efficient hardware support for applications with irregular memory patterns on multinode manycore architectures

CERIANI, MARCO;PALERMO, GIANLUCA
2017-01-01

Abstract

With computing systems becoming ubiquitous, numerous data sets of extremely large size are becoming available for analysis. Often the data collected have complex, graph based structures, which makes them difficult to process with traditional tools. Moreover, the irregularities in the data sets, and in the analysis algorithms, hamper the scaling of performance in large distributed highperformance systems, optimized for locality exploitation and regular data structures. In this paper we present an approach to system design that enable efficient execution of applications with irregular memory patterns on a distribute, many-core architecture, based on off-the-shelf cores. We introduce a set of hardware and software components, which provide a distributed global address space, fine-grained synchronization and transparently hide the latencies of remote accesses with multithreading. An FPGA prototype has been implemented to explore the design with a set of typical irregular kernels. We finally present an analytical model that highlights the benefits of the approach and help identifying the bottlenecks in the prototypes. The experimental evaluation on graph based applications demonstrates the scalability of the architecture for different configurations of the whole system.
2017
Computer architecture; Distributed computing; Field programmable gate arrays; High-performance computing; Irregular applications; Multithreaded architectures; Parallel architectures; Prototypes; Signal Processing; Hardware and Architecture; Computational Theory and Mathematics
File in questo prodotto:
File Dimensione Formato  
06871392.pdf

Accesso riservato

: Publisher’s version
Dimensione 912.36 kB
Formato Adobe PDF
912.36 kB Adobe PDF   Visualizza/Apri
11311-1026226 Palermo.pdf

accesso aperto

: Post-Print (DRAFT o Author’s Accepted Manuscript-AAM)
Dimensione 1.61 MB
Formato Adobe PDF
1.61 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11311/1026226
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 2
  • ???jsp.display-item.citation.isi??? 2
social impact