RE.PUBLIC@POLIMI pubblicazioni di ricerca del Politecnico di Milano

In batch learning, it is commonly assumed that samples are independent and identically distributed (i.i.d.). However, this assumption does not hold in dynamic environments where data streams are not identically distributed due to concept drifts. Furthermore, while most Streaming Machine Learning (SML) literature assumes independence among examples, data streams often have important temporal components that learning should adequately consider. Neglecting this temporal dependence can lead to significant misguidance in designing and evaluating SML models. To support our thesis, we propose Tenet, a novel benchmarking framework designed to evaluate data stream classifiers in non-i.i.d. scenarios comparatively. Tenet consists of a data stream generator and a baseline. The data stream generator introduces temporal dependence into the data streams commonly used for evaluating SML algorithms. The baseline is a continuous version of the Long Short-Term Memory algorithm called cLSTM. Extensive experiments using Tenet demonstrate that cLSTM consistently outperforms state-of-the-art SML classifiers when learning from data streams with temporal dependence. This result is a call to action for the SML and the Deep Learning communities to investigate classifiers in the time-dependent streaming scenario and makes Tenet the first publicly available benchmark to support this research.

Tenet: Benchmarking Data Stream Classifiers in Presence of Temporal Dependence

Ziffer, Giacomo;Giannini, Federico;Della Valle, Emanuele

2024-01-01

Abstract

In batch learning, it is commonly assumed that samples are independent and identically distributed (i.i.d.). However, this assumption does not hold in dynamic environments where data streams are not identically distributed due to concept drifts. Furthermore, while most Streaming Machine Learning (SML) literature assumes independence among examples, data streams often have important temporal components that learning should adequately consider. Neglecting this temporal dependence can lead to significant misguidance in designing and evaluating SML models. To support our thesis, we propose Tenet, a novel benchmarking framework designed to evaluate data stream classifiers in non-i.i.d. scenarios comparatively. Tenet consists of a data stream generator and a baseline. The data stream generator introduces temporal dependence into the data streams commonly used for evaluating SML algorithms. The baseline is a continuous version of the Long Short-Term Memory algorithm called cLSTM. Extensive experiments using Tenet demonstrate that cLSTM consistently outperforms state-of-the-art SML classifiers when learning from data streams with temporal dependence. This result is a call to action for the SML and the Deep Learning communities to investigate classifiers in the time-dependent streaming scenario and makes Tenet the first publicly available benchmark to support this research.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno di pubblicazione
	
				2024
			
	Titolo del libro
	
				2024 IEEE International Conference on Big Data (BigData)
			
	Parole chiave
	
				Benchmark
Concept Drift
Data Stream
Temporal Dependence
			
	Appare nelle tipologie:
	
				04.1 Contributo in Atti di convegno

File in questo prodotto:

Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11311/1287728

Citazioni

ND

0

ND

social impact