Over the past few years, GPUs have found widespread adoption in many scientific domains, offering notable performance and energy efficiency advantages compared to CPUs. However, optimizing GPU high-performance kernels poses challenges given the complexities of GPU architectures and programming models. Moreover, current GPU development tools provide few high-level suggestions and overlook the underlying hardware. Here we present Starlight, an open-source, highly flexible tool for enhancing GPU kernel analysis and optimization. Starlight autonomously describes Roofline Models, examines performance metrics, and correlates these insights with GPU architectural bottlenecks. Additionally, Starlight predicts potential performance enhancements before altering the source code. We demonstrate its efficacy by applying it to literature genomics and physics applications, attaining speedups from 1.1× to 2.5× over state-of-the-art baselines. Furthermore, Starlight supports the development of new GPU kernels, which we exemplify through an image processing application, showing speedups of 12.7× and 140× when compared against state-of-the-art FPGA- and GPU-based solutions.

Starlight: A kernel optimizer for GPU processing

Alberto Zeni;Emanuele del Sozzo;Eleonora D'Arnese;Davide Conficconi;Marco Domenico Santambrogio
2024-01-01

Abstract

Over the past few years, GPUs have found widespread adoption in many scientific domains, offering notable performance and energy efficiency advantages compared to CPUs. However, optimizing GPU high-performance kernels poses challenges given the complexities of GPU architectures and programming models. Moreover, current GPU development tools provide few high-level suggestions and overlook the underlying hardware. Here we present Starlight, an open-source, highly flexible tool for enhancing GPU kernel analysis and optimization. Starlight autonomously describes Roofline Models, examines performance metrics, and correlates these insights with GPU architectural bottlenecks. Additionally, Starlight predicts potential performance enhancements before altering the source code. We demonstrate its efficacy by applying it to literature genomics and physics applications, attaining speedups from 1.1× to 2.5× over state-of-the-art baselines. Furthermore, Starlight supports the development of new GPU kernels, which we exemplify through an image processing application, showing speedups of 12.7× and 140× when compared against state-of-the-art FPGA- and GPU-based solutions.
2024
File in questo prodotto:
File Dimensione Formato  
1-s2.0-S0743731523002022-main.pdf

accesso aperto

Dimensione 1.43 MB
Formato Adobe PDF
1.43 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11311/1258543
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 1
  • ???jsp.display-item.citation.isi??? 1
social impact