Heterogeneous systems are becoming increasingly popular, delivering high performance through hardware specialization. However, sequential data accesses may have a negative impact on performance. Data parallel solutions such as Polymorphic Register Files (PRFs) can potentially accelerate applications by facilitating high-speed, parallel access to performance-critical data. This article shows how PRFs can be integrated into dataflow computational platforms. Our semi-automatic, compiler-based methodology generates customized PRFs and modifies the computational kernels to efficiently exploit them. We use a separable 2D convolution case study to evaluate the impact of memory latency and bandwidth on performance compared to a state-of-the-art NVIDIA Tesla C2050 GPU. We improve the throughput up to 56.17X and show that the PRF-augmented system outperforms the GPU for 9×9 or larger mask sizes, even in bandwidth-constrained systems.

The Case for Polymorphic Registers in Dataflow Computing

PILATO, CHRISTIAN;SCIUTO, DONATELLA
2018-01-01

Abstract

Heterogeneous systems are becoming increasingly popular, delivering high performance through hardware specialization. However, sequential data accesses may have a negative impact on performance. Data parallel solutions such as Polymorphic Register Files (PRFs) can potentially accelerate applications by facilitating high-speed, parallel access to performance-critical data. This article shows how PRFs can be integrated into dataflow computational platforms. Our semi-automatic, compiler-based methodology generates customized PRFs and modifies the computational kernels to efficiently exploit them. We use a separable 2D convolution case study to evaluate the impact of memory latency and bandwidth on performance compared to a state-of-the-art NVIDIA Tesla C2050 GPU. We improve the throughput up to 56.17X and show that the PRF-augmented system outperforms the GPU for 9×9 or larger mask sizes, even in bandwidth-constrained systems.
2018
Dataflow computing Parallel memory accesses Polymorphic register file Bandwidth Vector lanes Convolution High performance computing High-level synthesis
File in questo prodotto:
File Dimensione Formato  
10.1007_s10766-017-0494-1.pdf

accesso aperto

Descrizione: articolo
: Publisher’s version
Dimensione 4.67 MB
Formato Adobe PDF
4.67 MB Adobe PDF Visualizza/Apri
Ciobanu2018_Article_TheCaseForPolymorphicRegisters.pdf

Accesso riservato

Descrizione: Version of Record
: Publisher’s version
Dimensione 4.37 MB
Formato Adobe PDF
4.37 MB Adobe PDF   Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11311/1022237
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 2
  • ???jsp.display-item.citation.isi??? 1
social impact