Geo-distributed machine learning (GDML) can facilitate collaborative learning among geographically dispersed data centers to meet the demands of distributed and privacy-preserving training for large-scale distributed Internet of Things applications. Unfortunately, the efficiency of distributed training tasks heavily depends on synchronized communication between multiple distributed models over bandwidth-limited wide area networks (WANs). The fine-grained optical transport network (fgOTN), thanks to its adjustable bandwidth connections, represents more flexible transmission and has the ability for accurate synchronization across GDML tasks in WANs. However, flexible bandwidth assignment and complex interdependencies among tasks pose significant challenges to resource allocation for GDML in fgOTN. Specifically, flexible bandwidth assignment exacerbates resource competition among task flows, leading to decreased learning efficiency. This article provides novel resource allocation solutions for GDML in fgOTN. We first formulate this problem as a linear programming aimed at maximizing the completion ratio of GDML tasks. Subsequently, we propose an innovative resource allocation algorithm based on genetic algorithm (GARA) for GDML in fgOTN. GARA considers both task completion and bandwidth adjustment through population generation based on prior knowledge and adaptive mutation based on completion ratio. Simulation analysis demonstrates that GARA effectively prioritizes resource allocation for high-priority tasks to alleviate resource competition, achieving the highest task completion ratio while avoiding excessive network reconfiguration.

Resource Allocation in Flexible-Bandwidth Fine-Grained Optical Transport Networks for Geo-Distributed Machine Learning

Tornatore, Massimo;
2025-01-01

Abstract

Geo-distributed machine learning (GDML) can facilitate collaborative learning among geographically dispersed data centers to meet the demands of distributed and privacy-preserving training for large-scale distributed Internet of Things applications. Unfortunately, the efficiency of distributed training tasks heavily depends on synchronized communication between multiple distributed models over bandwidth-limited wide area networks (WANs). The fine-grained optical transport network (fgOTN), thanks to its adjustable bandwidth connections, represents more flexible transmission and has the ability for accurate synchronization across GDML tasks in WANs. However, flexible bandwidth assignment and complex interdependencies among tasks pose significant challenges to resource allocation for GDML in fgOTN. Specifically, flexible bandwidth assignment exacerbates resource competition among task flows, leading to decreased learning efficiency. This article provides novel resource allocation solutions for GDML in fgOTN. We first formulate this problem as a linear programming aimed at maximizing the completion ratio of GDML tasks. Subsequently, we propose an innovative resource allocation algorithm based on genetic algorithm (GARA) for GDML in fgOTN. GARA considers both task completion and bandwidth adjustment through population generation based on prior knowledge and adaptive mutation based on completion ratio. Simulation analysis demonstrates that GARA effectively prioritizes resource allocation for high-priority tasks to alleviate resource competition, achieving the highest task completion ratio while avoiding excessive network reconfiguration.
2025
Fine-grained optical transport network (fgOTN)
geo-distributed machine leaning (GDML)
network reconfiguration
optical network
resource allocation
File in questo prodotto:
File Dimensione Formato  
LianM_IoT_25.pdf

Accesso riservato

Descrizione: LianN_IoT_25
: Publisher’s version
Dimensione 2.44 MB
Formato Adobe PDF
2.44 MB Adobe PDF   Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11311/1310592
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 2
  • ???jsp.display-item.citation.isi??? 2
social impact