Heterogeneous disaggregated systems represent a promising solution to deliver the performance required by next-generation High-Performance Computing (HPC) workloads. Nevertheless, their heterogeneity represents a significant challenge in the application development process, making it urgent to identify solutions to program these systems productively and efficiently without sacrificing performance. In this paper, we explore the potential of system topology information in addressing such a challenge. We delve into the foundations of system topology and how this information could be exploited in high-level programming libraries across all layers of the software infrastructures, from runtime systems to user-facing APIs. We propose K-Nearest Neighbors (KNN) as a simple case study, which we implement with a distributed programming library prototyped to exploit topology awareness. We demonstrate the solution’s effectiveness on a commodity cluster with GPU-equipped nodes and illustrate how it will apply to next-generation disaggregated hardware.
Programming the Future: the Essential Role of System Topology Awareness in Heterogeneous Disaggregated Environments
Branchini, Beatrice;Di Dio Lavore, Ian;Castellana, Vito Giovanni;Santambrogio, Marco
2024-01-01
Abstract
Heterogeneous disaggregated systems represent a promising solution to deliver the performance required by next-generation High-Performance Computing (HPC) workloads. Nevertheless, their heterogeneity represents a significant challenge in the application development process, making it urgent to identify solutions to program these systems productively and efficiently without sacrificing performance. In this paper, we explore the potential of system topology information in addressing such a challenge. We delve into the foundations of system topology and how this information could be exploited in high-level programming libraries across all layers of the software infrastructures, from runtime systems to user-facing APIs. We propose K-Nearest Neighbors (KNN) as a simple case study, which we implement with a distributed programming library prototyped to exploit topology awareness. We demonstrate the solution’s effectiveness on a commodity cluster with GPU-equipped nodes and illustrate how it will apply to next-generation disaggregated hardware.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.