When targeting an OpenCL application to platforms with multiple heterogeneous accelerators, task tuning and mapping have to cope with device-specific constraints. To address this problem, we present an innovative design flow for the customization and performance optimization of OpenCL applications on heterogeneous parallel platforms. It consists of two phases: 1) a tuning phase that optimizes each application kernel for a given platform and 2) a task-mapping phase that maximizes the overall application throughput by exploiting concurrency in the application task graph. The tuning phase is suitable for customizing parameterized OpenCL kernels considering device-specific constraints. Then, the mapping phase improves task-level parallelism for multi-device execution accounting for the overhead of memory transfers --- overheads implied by multiple OpenCL contexts for different device vendors. Benefits of the proposed design flow have been assessed on a stereo-matching application targeting two commercial heterogeneous platforms.

Customization of OpenCL Applications for Efficient Task Mapping Under Heterogeneous Platform Constraints

PAONE, EDOARDO;PALERMO, GIANLUCA;ZACCARIA, VITTORIO;SILVANO, CRISTINA
2015-01-01

Abstract

When targeting an OpenCL application to platforms with multiple heterogeneous accelerators, task tuning and mapping have to cope with device-specific constraints. To address this problem, we present an innovative design flow for the customization and performance optimization of OpenCL applications on heterogeneous parallel platforms. It consists of two phases: 1) a tuning phase that optimizes each application kernel for a given platform and 2) a task-mapping phase that maximizes the overall application throughput by exploiting concurrency in the application task graph. The tuning phase is suitable for customizing parameterized OpenCL kernels considering device-specific constraints. Then, the mapping phase improves task-level parallelism for multi-device execution accounting for the overhead of memory transfers --- overheads implied by multiple OpenCL contexts for different device vendors. Benefits of the proposed design flow have been assessed on a stereo-matching application targeting two commercial heterogeneous platforms.
2015
Proceedings of the 2015 Design, Automation & Test in Europe Conference & Exhibition
978-3-9815370-4-8
OpenCL, Heterogeneous architectures, autotuing, task level parallelism, GPGPU
File in questo prodotto:
File Dimensione Formato  
DATE15.pdf

Accesso riservato

Descrizione: Articolo principale
: Publisher’s version
Dimensione 945.01 kB
Formato Adobe PDF
945.01 kB Adobe PDF   Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11311/985458
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 17
  • ???jsp.display-item.citation.isi??? 14
social impact