RE.PUBLIC@POLIMI pubblicazioni di ricerca del Politecnico di Milano

With computing systems becoming ubiquitous, numerous data sets of extremely large size are becoming available for analysis. Often the data collected have complex, graph based structures, which makes them difficult to process with traditional tools. Moreover, the irregularities in the data sets, and in the analysis algorithms, hamper the scaling of performance in large distributed highperformance systems, optimized for locality exploitation and regular data structures. In this paper we present an approach to system design that enable efficient execution of applications with irregular memory patterns on a distribute, many-core architecture, based on off-the-shelf cores. We introduce a set of hardware and software components, which provide a distributed global address space, fine-grained synchronization and transparently hide the latencies of remote accesses with multithreading. An FPGA prototype has been implemented to explore the design with a set of typical irregular kernels. We finally present an analytical model that highlights the benefits of the approach and help identifying the bottlenecks in the prototypes. The experimental evaluation on graph based applications demonstrates the scalability of the architecture for different configurations of the whole system.

Exploring efficient hardware support for applications with irregular memory patterns on multinode manycore architectures

CERIANI, MARCO;Secchi, Simone;Villa, Oreste;Tumeo, Antonino;PALERMO, GIANLUCA

2017-01-01

Abstract

With computing systems becoming ubiquitous, numerous data sets of extremely large size are becoming available for analysis. Often the data collected have complex, graph based structures, which makes them difficult to process with traditional tools. Moreover, the irregularities in the data sets, and in the analysis algorithms, hamper the scaling of performance in large distributed highperformance systems, optimized for locality exploitation and regular data structures. In this paper we present an approach to system design that enable efficient execution of applications with irregular memory patterns on a distribute, many-core architecture, based on off-the-shelf cores. We introduce a set of hardware and software components, which provide a distributed global address space, fine-grained synchronization and transparently hide the latencies of remote accesses with multithreading. An FPGA prototype has been implemented to explore the design with a set of typical irregular kernels. We finally present an analytical model that highlights the benefits of the approach and help identifying the bottlenecks in the prototypes. The experimental evaluation on graph based applications demonstrates the scalability of the architecture for different configurations of the whole system.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno di pubblicazione
	
			2017
		
	Titolo della rivista
	
			IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS
		
	Parole chiave
	
			Computer architecture; Distributed computing; Field programmable gate arrays; High-performance computing; Irregular applications; Multithreaded architectures; Parallel architectures; Prototypes; Signal Processing; Hardware and Architecture; Computational Theory and Mathematics
		
	Appare nelle tipologie:
	
			01.1 Articolo in Rivista

File in questo prodotto:

File	Dimensione	Formato
06871392.pdf Accesso riservato : Publisher’s version Dimensione 912.36 kB Formato Adobe PDF Visualizza/Apri	912.36 kB	Adobe PDF	Visualizza/Apri
11311-1026226 Palermo.pdf accesso aperto : Post-Print (DRAFT o Author’s Accepted Manuscript-AAM) Dimensione 1.61 MB Formato Adobe PDF Visualizza/Apri	1.61 MB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11311/1026226

Citazioni

ND

2

2

social impact