Hardware accelerators have always been difficult to approach. In recent years, we have experienced great efforts to simplify their programming paradigms, especially on GPUs. This led to the development of various domain-specific frameworks and microarchitectural features that facilitated some aspects of this multifaced problem. One such feature is the Unified Virtual Memory (UVM) oversubscription mechanism that allows the developer to handle datasets with a bigger memory footprint than the HW accelerators. Although promising, current UVM faces extreme overheads when running large workloads that reach an oversubscription factor (allocated vs. available memory) ampler than a per-workload threshold. In this work, we propose GrOUT, a language- and domain-agnostic framework that tackles the slowdowns brought by the UVM oversubscription mechanism. In particular, we highlight how a scale-out approach is a feasible solution to solve the slowdowns brought by UVM on workloads from various domains. Moreover, we design a framework capable of autonomously scaling out user-provided workloads, reaching a speedup of more than 24.42 × with minimal changes to the application logic.

GrOUT: Transparent Scale-Out to Overcome UVM's Oversubscription Slowdowns

Dio Lavore, Ian Di;Maffi, Davide;Santambrogio, Marco D.
2024-01-01

Abstract

Hardware accelerators have always been difficult to approach. In recent years, we have experienced great efforts to simplify their programming paradigms, especially on GPUs. This led to the development of various domain-specific frameworks and microarchitectural features that facilitated some aspects of this multifaced problem. One such feature is the Unified Virtual Memory (UVM) oversubscription mechanism that allows the developer to handle datasets with a bigger memory footprint than the HW accelerators. Although promising, current UVM faces extreme overheads when running large workloads that reach an oversubscription factor (allocated vs. available memory) ampler than a per-workload threshold. In this work, we propose GrOUT, a language- and domain-agnostic framework that tackles the slowdowns brought by the UVM oversubscription mechanism. In particular, we highlight how a scale-out approach is a feasible solution to solve the slowdowns brought by UVM on workloads from various domains. Moreover, we design a framework capable of autonomously scaling out user-provided workloads, reaching a speedup of more than 24.42 × with minimal changes to the application logic.
2024
2024 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)
979-8-3503-6460-6
Distributed processing; Computer languages; Runtime; Microarchitecture; Programming
File in questo prodotto:
File Dimensione Formato  
grout_HIPS24.pdf

accesso aperto

: Post-Print (DRAFT o Author’s Accepted Manuscript-AAM)
Dimensione 924.18 kB
Formato Adobe PDF
924.18 kB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11311/1270783
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact