The continuous search for the k best results given a query (top-k) in a data stream is a problem that has gained a lot of attention in recent years, especially in social media and IoT contexts. In these contexts it is essential that the system is able to respond reactively. Therefore, in this study we address how to scale the monitoring of approximate continuous top-k queries to guarantee the reactiveness of the system. In this paper we introduce a scalable distributed version of the MinTopK+N algorithm that allows the system to remain reactive when the load increases. In addition, we present a scale-out/in algorithm that allows the system to add instances to cope with workload increments and remove instances to avoid over-provisioning. Furthermore, in a set of managed experiments, we demonstrated how the increase in the workload could affect the system performance and the impact of the instant of triggering scaling actions on the system performance. Finally, we present a study on the impact of the scaling action on the correctness of the result, showing that the scaling algorithm does not negatively impact the correctness of the results, and in certain conditions, it can even improve the quality of the approximation.

Scaling the monitoring of approximate top-k queries in streaming windows

Zahmatkesh, Shima;Valle, Emanuele Della
2021-01-01

Abstract

The continuous search for the k best results given a query (top-k) in a data stream is a problem that has gained a lot of attention in recent years, especially in social media and IoT contexts. In these contexts it is essential that the system is able to respond reactively. Therefore, in this study we address how to scale the monitoring of approximate continuous top-k queries to guarantee the reactiveness of the system. In this paper we introduce a scalable distributed version of the MinTopK+N algorithm that allows the system to remain reactive when the load increases. In addition, we present a scale-out/in algorithm that allows the system to add instances to cope with workload increments and remove instances to avoid over-provisioning. Furthermore, in a set of managed experiments, we demonstrated how the increase in the workload could affect the system performance and the impact of the instant of triggering scaling actions on the system performance. Finally, we present a study on the impact of the scaling action on the correctness of the result, showing that the scaling algorithm does not negatively impact the correctness of the results, and in certain conditions, it can even improve the quality of the approximation.
2021
2021 IEEE International Conference on Big Data, Big Data 2021
978-1-6654-3902-2
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11311/1203212
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 2
  • ???jsp.display-item.citation.isi??? 1
social impact