Today's cloud system are composed of geographically distributed datacenter interconnected by high-speed optical networks. Disaster failures can severely affect both the communication network as well as datacenters infrastructure and prevent users from accessing cloud services. After large-scale disasters, recovery efforts on both network and datacenters may take days, and, in some cases, weeks or months. Traditionally, the repair of the communication network has been treated as a separate problem from the repair of datacenters. While past research has mostly focused on network recovery, how to efficiently recover a cloud system jointly considering the limited computing and networking resources has been an important and open research problem. In this work, we investigate the problem of progressive datacenter recovery after a large-scale disaster failure, given that a network-recovery plan is made. An efficient recovery plan is explored to determine which datacenters should be recovered at each recovery stage to maximize cumulative content reachability from any source considering limited available network resources. We devise an Integer Linear Program (ILP) formulation to model the associated optimization problem. Our numerical examples using the ILP show that an efficient progressive datacenter-recovery plan can significantly help to increase reachability of contents during the network recovery phase. We succeeded in increasing the number of important contents in the early stages of recovery compared to a random-recovery strategy with a slight increase in resource consumption.

Progressive datacenter recovery over optical core networks after a large-scale disaster

TORNATORE, MASSIMO;
2016

Abstract

Today's cloud system are composed of geographically distributed datacenter interconnected by high-speed optical networks. Disaster failures can severely affect both the communication network as well as datacenters infrastructure and prevent users from accessing cloud services. After large-scale disasters, recovery efforts on both network and datacenters may take days, and, in some cases, weeks or months. Traditionally, the repair of the communication network has been treated as a separate problem from the repair of datacenters. While past research has mostly focused on network recovery, how to efficiently recover a cloud system jointly considering the limited computing and networking resources has been an important and open research problem. In this work, we investigate the problem of progressive datacenter recovery after a large-scale disaster failure, given that a network-recovery plan is made. An efficient recovery plan is explored to determine which datacenters should be recovered at each recovery stage to maximize cumulative content reachability from any source considering limited available network resources. We devise an Integer Linear Program (ILP) formulation to model the associated optimization problem. Our numerical examples using the ILP show that an efficient progressive datacenter-recovery plan can significantly help to increase reachability of contents during the network recovery phase. We succeeded in increasing the number of important contents in the early stages of recovery compared to a random-recovery strategy with a slight increase in resource consumption.
Proceedings of the 2016 12th International Conference on the Design of Reliable Communication Networks, DRCN 2016
9781467384964
9781467384964
Safety, Risk, Reliability and Quality; Computer Networks and Communications; Control and Systems Engineering
File in questo prodotto:
File Dimensione Formato  
Ferdousi_DRCN_16.pdf

Accesso riservato

Descrizione: Ferdousi_DRCN_2016
: Pre-Print (o Pre-Refereeing)
Dimensione 2.8 MB
Formato Adobe PDF
2.8 MB Adobe PDF   Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11311/1005413
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 3
  • ???jsp.display-item.citation.isi??? 1
social impact