In batch learning, it is commonly assumed that samples are independent and identically distributed (i.i.d.). However, this assumption does not hold in dynamic environments where data streams are not identically distributed due to concept drifts. Furthermore, while most Streaming Machine Learning (SML) literature assumes independence among examples, data streams often have important temporal components that learning should adequately consider. Neglecting this temporal dependence can lead to significant misguidance in designing and evaluating SML models. To support our thesis, we propose Tenet, a novel benchmarking framework designed to evaluate data stream classifiers in non-i.i.d. scenarios comparatively. Tenet consists of a data stream generator and a baseline. The data stream generator introduces temporal dependence into the data streams commonly used for evaluating SML algorithms. The baseline is a continuous version of the Long Short-Term Memory algorithm called cLSTM. Extensive experiments using Tenet demonstrate that cLSTM consistently outperforms state-of-the-art SML classifiers when learning from data streams with temporal dependence. This result is a call to action for the SML and the Deep Learning communities to investigate classifiers in the time-dependent streaming scenario and makes Tenet the first publicly available benchmark to support this research.

Tenet: Benchmarking Data Stream Classifiers in Presence of Temporal Dependence

Ziffer, Giacomo;Giannini, Federico;Della Valle, Emanuele
2024-01-01

Abstract

In batch learning, it is commonly assumed that samples are independent and identically distributed (i.i.d.). However, this assumption does not hold in dynamic environments where data streams are not identically distributed due to concept drifts. Furthermore, while most Streaming Machine Learning (SML) literature assumes independence among examples, data streams often have important temporal components that learning should adequately consider. Neglecting this temporal dependence can lead to significant misguidance in designing and evaluating SML models. To support our thesis, we propose Tenet, a novel benchmarking framework designed to evaluate data stream classifiers in non-i.i.d. scenarios comparatively. Tenet consists of a data stream generator and a baseline. The data stream generator introduces temporal dependence into the data streams commonly used for evaluating SML algorithms. The baseline is a continuous version of the Long Short-Term Memory algorithm called cLSTM. Extensive experiments using Tenet demonstrate that cLSTM consistently outperforms state-of-the-art SML classifiers when learning from data streams with temporal dependence. This result is a call to action for the SML and the Deep Learning communities to investigate classifiers in the time-dependent streaming scenario and makes Tenet the first publicly available benchmark to support this research.
2024
2024 IEEE International Conference on Big Data (BigData)
Benchmark
Concept Drift
Data Stream
Temporal Dependence
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11311/1287728
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? ND
social impact