Distributed streaming applications, i.e., applications that process massive streams of data in a distributed fashion, are becoming increasingly popular to tame the velocity and the volume of Big Data. Nevertheless, the widespread adoption of data-intensive processing is still limited by the non-trivial design paradigms involved, which deal with the unboundedness and volume of involved data streams and by the many distributed streaming platforms, each with its own characteristics and APIs. In this article, we present StreamGen, a Model-Driven Engineering tool to simplify the design of such streaming applications and automatically generate the corresponding code. StreamGen is able to automatically generate fully working and processing-ready code for different target platforms (e.g., Apache Spark, Apache Flink). Evaluation shows that (i) StreamGen is general enough to model and generate the code, offering comparable performance against a preexisting similar and well-known application; (ii) the tool is fully compliant with streaming concepts defined as part of the Google Dataflow Model; and (iii) users with little computer science background and limited experience with big data have been able to work with StreamGen and create/refactor an application in a matter of minutes.

StreamGen: Model-Driven Development of Distributed Streaming Applications

Damian Andrew Tamburri;Elisabetta Di Nitto
2021-01-01

Abstract

Distributed streaming applications, i.e., applications that process massive streams of data in a distributed fashion, are becoming increasingly popular to tame the velocity and the volume of Big Data. Nevertheless, the widespread adoption of data-intensive processing is still limited by the non-trivial design paradigms involved, which deal with the unboundedness and volume of involved data streams and by the many distributed streaming platforms, each with its own characteristics and APIs. In this article, we present StreamGen, a Model-Driven Engineering tool to simplify the design of such streaming applications and automatically generate the corresponding code. StreamGen is able to automatically generate fully working and processing-ready code for different target platforms (e.g., Apache Spark, Apache Flink). Evaluation shows that (i) StreamGen is general enough to model and generate the code, offering comparable performance against a preexisting similar and well-known application; (ii) the tool is fully compliant with streaming concepts defined as part of the Google Dataflow Model; and (iii) users with little computer science background and limited experience with big data have been able to work with StreamGen and create/refactor an application in a matter of minutes.
2021
streaming applications, Model-driven engineering, big data architectures
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11311/1159231
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 1
  • ???jsp.display-item.citation.isi??? 1
social impact