In this article, we report on the application of resiliency enforcement strategies that were applied to a microservices system running on a real-world deployment of a large cluster of heterogeneous Virtual Machines (VMs). We present the evaluation results obtained from measurement and modeling implementations. The measurement infrastructure was composed of 15 large and 15 extra-large VMs. The modeling approach used Markov Decision Processes (MDP). On the measurement testbed, we implemented three different levels of software rejuvenation granularity to achieve software resiliency. We have discovered two threats to resiliency in this environment. The first threat to resiliency was a memory leak that was part of the underlying open-source infrastructure in each VM. The second threat to resiliency was the result of the contention for resources in the physical host, which is dependent on the number and size of VMs deployed to the physical host. In the MDP modeling approach, we evaluated four strategies for assigning tasks to VMs with different configurations and different levels of parallelism. Using the large cluster under study, we compared our approach of using software aging and rejuvenation with the state-of-the-art approach of using a network of VMs deployed to a private cloud without software aging detection and rejuvenation. In summary, we show that in a private cloud with non-elastic resource allocation in the physical hosts, careful performance engineering needs to be performed to optimize the trade-offs between the number of VMs allocated and the total memory allocated to each VM.
Software Aging Detection and Rejuvenation Assessment in Heterogeneous Virtual Networks
Camilli, Matteo;
2025-01-01
Abstract
In this article, we report on the application of resiliency enforcement strategies that were applied to a microservices system running on a real-world deployment of a large cluster of heterogeneous Virtual Machines (VMs). We present the evaluation results obtained from measurement and modeling implementations. The measurement infrastructure was composed of 15 large and 15 extra-large VMs. The modeling approach used Markov Decision Processes (MDP). On the measurement testbed, we implemented three different levels of software rejuvenation granularity to achieve software resiliency. We have discovered two threats to resiliency in this environment. The first threat to resiliency was a memory leak that was part of the underlying open-source infrastructure in each VM. The second threat to resiliency was the result of the contention for resources in the physical host, which is dependent on the number and size of VMs deployed to the physical host. In the MDP modeling approach, we evaluated four strategies for assigning tasks to VMs with different configurations and different levels of parallelism. Using the large cluster under study, we compared our approach of using software aging and rejuvenation with the state-of-the-art approach of using a network of VMs deployed to a private cloud without software aging detection and rejuvenation. In summary, we show that in a private cloud with non-elastic resource allocation in the physical hosts, careful performance engineering needs to be performed to optimize the trade-offs between the number of VMs allocated and the total memory allocated to each VM.| File | Dimensione | Formato | |
|---|---|---|---|
|
Software_Aging_Detection_and_Rejuvenation_Assessment_in_Heterogeneous_Virtual_Networks.pdf
accesso aperto
:
Publisher’s version
Dimensione
4.84 MB
Formato
Adobe PDF
|
4.84 MB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


