Scaling the monitoring of approximate top-k queries in streaming windows

Ramperez, Victor; Zahmatkesh, Shima; Valle, Emanuele Della

doi:10.1109/BigData52589.2021.9672041

The continuous search for the k best results given a query (top-k) in a data stream is a problem that has gained a lot of attention in recent years, especially in social media and IoT contexts. In these contexts it is essential that the system is able to respond reactively. Therefore, in this study we address how to scale the monitoring of approximate continuous top-k queries to guarantee the reactiveness of the system. In this paper we introduce a scalable distributed version of the MinTopK+N algorithm that allows the system to remain reactive when the load increases. In addition, we present a scale-out/in algorithm that allows the system to add instances to cope with workload increments and remove instances to avoid over-provisioning. Furthermore, in a set of managed experiments, we demonstrated how the increase in the workload could affect the system performance and the impact of the instant of triggering scaling actions on the system performance. Finally, we present a study on the impact of the scaling action on the correctness of the result, showing that the scaling algorithm does not negatively impact the correctness of the results, and in certain conditions, it can even improve the quality of the approximation.