Nowadays, every device connected to the Internet generates an ever-growing (formally, unbounded) stream of data. Machine Learning on data streams is a grand challenge due to its resource constraints. Indeed, standard machine learning techniques are not able to deal with data whose statistics are subject to gradual or sudden changes (formally, concept drift) without any warning. Massive Online Analysis (MOA) is the collective name, as well as a software library, for new learners that can manage data streams. In this paper, we present a research study on streaming rebalancing. Indeed, data streams can be imbalanced as static data, but there is not a method to rebalance them incrementally. For this reason, we propose a new streaming approach able to rebalance data streams online. Our new methodology is evaluated against some synthetically generated datasets using prequential evaluation to demonstrate that it outperforms the existing approaches.

Incremental rebalancing learning on evolving data streams

Bernardo, Alessio;della Valle, Emanuele;
2020-01-01

Abstract

Nowadays, every device connected to the Internet generates an ever-growing (formally, unbounded) stream of data. Machine Learning on data streams is a grand challenge due to its resource constraints. Indeed, standard machine learning techniques are not able to deal with data whose statistics are subject to gradual or sudden changes (formally, concept drift) without any warning. Massive Online Analysis (MOA) is the collective name, as well as a software library, for new learners that can manage data streams. In this paper, we present a research study on streaming rebalancing. Indeed, data streams can be imbalanced as static data, but there is not a method to rebalance them incrementally. For this reason, we propose a new streaming approach able to rebalance data streams online. Our new methodology is evaluated against some synthetically generated datasets using prequential evaluation to demonstrate that it outperforms the existing approaches.
2020
20th International Conference on Data Mining Workshops, ICDM Workshops 2020, Sorrento, Italy, November 17-20, 2020
9781728190129
Evolving Data Stream , Streaming , Concept Drift , MOA , Balancing
File in questo prodotto:
File Dimensione Formato  
Incremental_Rebalancing_Learning_on_Evolving_Data_Streams_IncrLearn_Workshop___ICDM_2020.pdf

Accesso riservato

: Publisher’s version
Dimensione 759.74 kB
Formato Adobe PDF
759.74 kB Adobe PDF   Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11311/1161873
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 14
  • ???jsp.display-item.citation.isi??? 10
social impact