Heterogeneous disaggregated systems represent a promising solution to deliver the performance required by next-generation High-Performance Computing (HPC) workloads. Nevertheless, their heterogeneity represents a significant challenge in the application development process, making it urgent to identify solutions to program these systems productively and efficiently without sacrificing performance. In this paper, we explore the potential of system topology information in addressing such a challenge. We delve into the foundations of system topology and how this information could be exploited in high-level programming libraries across all layers of the software infrastructures, from runtime systems to user-facing APIs. We propose K-Nearest Neighbors (KNN) as a simple case study, which we implement with a distributed programming library prototyped to exploit topology awareness. We demonstrate the solution’s effectiveness on a commodity cluster with GPU-equipped nodes and illustrate how it will apply to next-generation disaggregated hardware.

Programming the Future: the Essential Role of System Topology Awareness in Heterogeneous Disaggregated Environments

Branchini, Beatrice;Di Dio Lavore, Ian;Castellana, Vito Giovanni;Santambrogio, Marco
2024-01-01

Abstract

Heterogeneous disaggregated systems represent a promising solution to deliver the performance required by next-generation High-Performance Computing (HPC) workloads. Nevertheless, their heterogeneity represents a significant challenge in the application development process, making it urgent to identify solutions to program these systems productively and efficiently without sacrificing performance. In this paper, we explore the potential of system topology information in addressing such a challenge. We delve into the foundations of system topology and how this information could be exploited in high-level programming libraries across all layers of the software infrastructures, from runtime systems to user-facing APIs. We propose K-Nearest Neighbors (KNN) as a simple case study, which we implement with a distributed programming library prototyped to exploit topology awareness. We demonstrate the solution’s effectiveness on a commodity cluster with GPU-equipped nodes and illustrate how it will apply to next-generation disaggregated hardware.
2024
ACM International Conference Proceeding Series
Disaggregated Memory
Distributed Systems
GPU
Heterogeneous Computing
High-Performance Computing
Programming Libraries
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11311/1285749
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact