RE.PUBLIC@POLIMI pubblicazioni di ricerca del Politecnico di Milano

Neural networks (NN) serve as the backbone for various applications, including computer vision, speech recognition, and natural language processing. Due to their iterative nature, training NNs is a highly compute-intensive task that is typically executed using a statically allocated set of devices (e.g., CPUs or GPUs). This static allocation prevents adjusting priorities, making it impossible to reassign resources to urgent tasks, and potentially causing high-priority training jobs to miss their expected completion times. This paper proposes DECOR-NN (DEadline COnstrained Resource allocation for Neural Networks), a control mechanism for NN training that dynamically allocates resources according to a user-defined deadline (i.e., a Service Level Agreement), ensuring that the training phase completes within the specified time. The solution leverages control theory and has been developed on top of PyTorch, a widely-used framework for training NNs. DECOR-NN dynamically allocates either GPUs or fractions of CPUs to meet user deadlines and also allows users to modify the deadline at runtime to accommodate changes in job priorities. A comprehensive empirical evaluation using three benchmark applications demonstrates that DECOR-NN successfully completes training jobs with an average deviation from the deadline of only 1.75 %.

Dynamic Resource Allocation for Deadline-Constrained Neural Network Training

Luciano Baresi;Marco Garlini;Giovanni Quattrocchi

2025-01-01

Abstract

Neural networks (NN) serve as the backbone for various applications, including computer vision, speech recognition, and natural language processing. Due to their iterative nature, training NNs is a highly compute-intensive task that is typically executed using a statically allocated set of devices (e.g., CPUs or GPUs). This static allocation prevents adjusting priorities, making it impossible to reassign resources to urgent tasks, and potentially causing high-priority training jobs to miss their expected completion times. This paper proposes DECOR-NN (DEadline COnstrained Resource allocation for Neural Networks), a control mechanism for NN training that dynamically allocates resources according to a user-defined deadline (i.e., a Service Level Agreement), ensuring that the training phase completes within the specified time. The solution leverages control theory and has been developed on top of PyTorch, a widely-used framework for training NNs. DECOR-NN dynamically allocates either GPUs or fractions of CPUs to meet user deadlines and also allows users to modify the deadline at runtime to accommodate changes in job priorities. A comprehensive empirical evaluation using three benchmark applications demonstrates that DECOR-NN successfully completes training jobs with an average deviation from the deadline of only 1.75 %.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno di pubblicazione
	
				2025
			
	Titolo del libro
	
				20th IEEE/ACM Symposium on Software Engineering for Adaptive and Self-Managing Systems, SEAMS 2025
			
	ISBN (International Standard Book Number)
	
				9798331501815
			
	Appare nelle tipologie:
	
				04.1 Contributo in Atti di convegno

File in questo prodotto:

File	Dimensione	Formato
SEAMS_25___Dynamic_Resource_Allocation_for_Deadline_Constrained__Neural_Network_Training-11.pdf accesso aperto Dimensione 448.58 kB Formato Adobe PDF Visualizza/Apri	448.58 kB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11311/1282089

Citazioni

ND

0

0

social impact