High abstraction level models can be used within the system-level simulation to allow rapid evaluations of architectural aspects in early Design Space Exploration (DSE) and direct the development decisions. Further, early DSE is of paramount importance in the specification of future Embedded Systems (ES) and its evaluation for applications with high computing demands and energy restrictions. This paper presents the exploration of Heterogeneous Task-Level Parallelism (HTLP) in a Block-Matching Algorithm (BMA) video coding application. HTLP means the creation and execution of simultaneous threads of kernels defined for different types of Processing Elements (PE) - e.g., CPU and GPU - but all for an equal purpose. We employ a BMA implementation as a case study, and its characteristics are used to explore the HTLP - in particular, its kernels for data preparation, SAD (sum of absolute differences) criteria calculation, and SAD values grouping. For the exploration, a system-level simulation framework (SAVE-htlp) is augmented, being able to support the HTLP. In the performed experiments, SAVE-htlp simulates workload and architecture models and explores 22 settings varying the PE type employed during the tasks' execution and the number of concurrent threads for each kernel. Execution time, performance, energy, and power results show HTLP settings overcoming CPU-only ones as well as those with solely GPUs to process its tasks.

Exploring heterogeneous task-level parallelism in a BMA video coding application using system-level simulation

Miele A.;
2018-01-01

Abstract

High abstraction level models can be used within the system-level simulation to allow rapid evaluations of architectural aspects in early Design Space Exploration (DSE) and direct the development decisions. Further, early DSE is of paramount importance in the specification of future Embedded Systems (ES) and its evaluation for applications with high computing demands and energy restrictions. This paper presents the exploration of Heterogeneous Task-Level Parallelism (HTLP) in a Block-Matching Algorithm (BMA) video coding application. HTLP means the creation and execution of simultaneous threads of kernels defined for different types of Processing Elements (PE) - e.g., CPU and GPU - but all for an equal purpose. We employ a BMA implementation as a case study, and its characteristics are used to explore the HTLP - in particular, its kernels for data preparation, SAD (sum of absolute differences) criteria calculation, and SAD values grouping. For the exploration, a system-level simulation framework (SAVE-htlp) is augmented, being able to support the HTLP. In the performed experiments, SAVE-htlp simulates workload and architecture models and explores 22 settings varying the PE type employed during the tasks' execution and the number of concurrent threads for each kernel. Execution time, performance, energy, and power results show HTLP settings overcoming CPU-only ones as well as those with solely GPUs to process its tasks.
2018
Brazilian Symposium on Computing System Engineering, SBESC
978-1-7281-0240-5
Block-matching algorithm; Embedded systems; Heterogeneous task-level parallelism; System-level simulation
File in questo prodotto:
File Dimensione Formato  
08691923.pdf

Accesso riservato

: Publisher’s version
Dimensione 655.01 kB
Formato Adobe PDF
655.01 kB Adobe PDF   Visualizza/Apri
Exploring_Heterogeneous_Task_Level_Parallelismin_a_BMA_Video_Coding_Application_usingSystem_Level_Simulation.pdf

accesso aperto

: Post-Print (DRAFT o Author’s Accepted Manuscript-AAM)
Dimensione 668.76 kB
Formato Adobe PDF
668.76 kB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11311/1105849
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? 0
social impact