Stream processing systems are increasingly becoming a core element in the data processing stack of many large companies, where they complement data management frameworks to build comprehensive solutions for processing, storage, and query. The adoption of separate tools leads to complex architectures that leave developers with the difficult task of writing application-specific code that ensures integration correctness. This hinders design, implementation, maintenance, and evolution. We address this problem with a new model that seamlessly integrates data management capabilities within a distributed stream processor. The model makes the state of stream processing operators externally visible and queryable, providing transactional guarantees for state accesses and updates. It enables developers to configure transactions obtaining strong guarantees when needed and relaxing them for higher performance when possible. We introduce the new model and formalize the transactional guarantees it offers. We discuss the implementation of the model into the TSpoon tool and experiment different algorithms to enforce transactional behavior. We evaluate the performance of TSpoon with real world case studies and synthetic workloads, compare it with state-of-the-art tools for distributed in-memory stream processing and data management, and analyze in detail the cost to ensure various transactional semantics.

TSpoon: Transactions on a stream processor

Lorenzo Affetti;Alessandro Margara;Gianpaolo Cugola
2020-01-01

Abstract

Stream processing systems are increasingly becoming a core element in the data processing stack of many large companies, where they complement data management frameworks to build comprehensive solutions for processing, storage, and query. The adoption of separate tools leads to complex architectures that leave developers with the difficult task of writing application-specific code that ensures integration correctness. This hinders design, implementation, maintenance, and evolution. We address this problem with a new model that seamlessly integrates data management capabilities within a distributed stream processor. The model makes the state of stream processing operators externally visible and queryable, providing transactional guarantees for state accesses and updates. It enables developers to configure transactions obtaining strong guarantees when needed and relaxing them for higher performance when possible. We introduce the new model and formalize the transactional guarantees it offers. We discuss the implementation of the model into the TSpoon tool and experiment different algorithms to enforce transactional behavior. We evaluate the performance of TSpoon with real world case studies and synthetic workloads, compare it with state-of-the-art tools for distributed in-memory stream processing and data management, and analyze in detail the cost to ensure various transactional semantics.
2020
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11311/1146707
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 13
  • ???jsp.display-item.citation.isi??? 5
social impact