The network availability of business-critical applications is fundamental to decrease the possibility of a malfunction of daily operations that rely on it. One way to tackle these contingencies is the involvement of Software-Defined Wide Area Network (SD-WAN) paradigm to optimize network performance and reliability by employing intelligent traffic steering reducing the number of disruptions. In this work, we explore an SD-WAN scenario in which clients communicate with servers via channels through overlay tunnels. Our goal is to improve network availability by dynamically rerouting the traffic flow between clients and servers over the channel with the best performance. This is accomplished by leveraging on a multi-agent reinforcement learning environment designed to handle incoming telemetry data, and network agents to learn and adapt their decision-making based on real-time feedback. The obtained outcome shows that our approach based on the double deep q-network algorithm performs better compared to an RTT-based greedy policy both in the case of a single-agent scenario and in a context where four agents are employed.
A MARL Approach to Employ Intelligent Traffic Steering in SD-WAN
Giacometti L.;Selvamuthukumaran K.;Sguotti G.;Troia S.;Verticale G.
2025-01-01
Abstract
The network availability of business-critical applications is fundamental to decrease the possibility of a malfunction of daily operations that rely on it. One way to tackle these contingencies is the involvement of Software-Defined Wide Area Network (SD-WAN) paradigm to optimize network performance and reliability by employing intelligent traffic steering reducing the number of disruptions. In this work, we explore an SD-WAN scenario in which clients communicate with servers via channels through overlay tunnels. Our goal is to improve network availability by dynamically rerouting the traffic flow between clients and servers over the channel with the best performance. This is accomplished by leveraging on a multi-agent reinforcement learning environment designed to handle incoming telemetry data, and network agents to learn and adapt their decision-making based on real-time feedback. The obtained outcome shows that our approach based on the double deep q-network algorithm performs better compared to an RTT-based greedy policy both in the case of a single-agent scenario and in a context where four agents are employed.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


