Optimizing Quality-Aware Big Data Applications in the Cloud

Gianniti, E.; Ciavotta, M.; Ardagna, D.

doi:10.1109/TCC.2018.2874944

The last years witnessed a steep rise in data generation worldwide and, consequently, the widespread adoption of software solutions able to support data-intensive application. Competitiveness and innovation have strongly benefited from these new platforms and methodologies, and there is a great deal of interest around the new possibilities that Big Data analytics promise to make reality. Many companies currently engage in data-intensive processes as part of their core businesses; however, fully embracing the data-driven paradigm is still cumbersome, and establishing a production-ready, fine-tuned deployment is time-consuming, expensive, and resource-intensive. This situation calls for innovative models and techniques to streamline the process of deployment configuration for Big Data applications. In particular, the focus in this paper is on the rightsizing of Cloud deployed clusters, which represent a cost-effective alternative to installation on premises. This paper proposes a novel tool, integrated in a wider DevOps-inspired approach, implementing a parallel and distributed simulation-optimization technique that efficiently and effectively explores the space of alternative Cloud configurations, seeking the minimum cost deployment that satisfies quality of service constraints. The soundness of the proposed solution has been thoroughly validated in a vast experimental campaign encompassing different applications and Big Data platforms.