In this article, we report on the application of resiliency enforcement strategies that were applied to a microservices system running on a real-world deployment of a large cluster of heterogeneous Virtual Machines (VMs). We present the evaluation results obtained from measurement and modeling implementations. The measurement infrastructure was composed of 15 large and 15 extra-large VMs. The modeling approach used Markov Decision Processes (MDP). On the measurement testbed, we implemented three different levels of software rejuvenation granularity to achieve software resiliency. We have discovered two threats to resiliency in this environment. The first threat to resiliency was a memory leak that was part of the underlying open-source infrastructure in each VM. The second threat to resiliency was the result of the contention for resources in the physical host, which is dependent on the number and size of VMs deployed to the physical host. In the MDP modeling approach, we evaluated four strategies for assigning tasks to VMs with different configurations and different levels of parallelism. Using the large cluster under study, we compared our approach of using software aging and rejuvenation with the state-of-the-art approach of using a network of VMs deployed to a private cloud without software aging detection and rejuvenation. In summary, we show that in a private cloud with non-elastic resource allocation in the physical hosts, careful performance engineering needs to be performed to optimize the trade-offs between the number of VMs allocated and the total memory allocated to each VM.

Software Aging Detection and Rejuvenation Assessment in Heterogeneous Virtual Networks

Camilli, Matteo;
2025-01-01

Abstract

In this article, we report on the application of resiliency enforcement strategies that were applied to a microservices system running on a real-world deployment of a large cluster of heterogeneous Virtual Machines (VMs). We present the evaluation results obtained from measurement and modeling implementations. The measurement infrastructure was composed of 15 large and 15 extra-large VMs. The modeling approach used Markov Decision Processes (MDP). On the measurement testbed, we implemented three different levels of software rejuvenation granularity to achieve software resiliency. We have discovered two threats to resiliency in this environment. The first threat to resiliency was a memory leak that was part of the underlying open-source infrastructure in each VM. The second threat to resiliency was the result of the contention for resources in the physical host, which is dependent on the number and size of VMs deployed to the physical host. In the MDP modeling approach, we evaluated four strategies for assigning tasks to VMs with different configurations and different levels of parallelism. Using the large cluster under study, we compared our approach of using software aging and rejuvenation with the state-of-the-art approach of using a network of VMs deployed to a private cloud without software aging detection and rejuvenation. In summary, we show that in a private cloud with non-elastic resource allocation in the physical hosts, careful performance engineering needs to be performed to optimize the trade-offs between the number of VMs allocated and the total memory allocated to each VM.
2025
software aging
software rejuvenation
Software resiliency
File in questo prodotto:
File Dimensione Formato  
Software_Aging_Detection_and_Rejuvenation_Assessment_in_Heterogeneous_Virtual_Networks.pdf

accesso aperto

: Publisher’s version
Dimensione 4.84 MB
Formato Adobe PDF
4.84 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11311/1297693
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? 0
social impact