RE.PUBLIC@POLIMI pubblicazioni di ricerca del Politecnico di Milano

Streaming Machine Learning (SML) studies single-pass learning algorithms that update their models one data item at a time given an unbounded and often non-stationary flow of data (a.k.a., in presence of concept drift). Online class imbalance learning is a branch of SML that combines the challenges of both class imbalance and concept drift. In this paper, we investigate the binary classification problem of rebalancing an imbalanced stream of data in the presence of concept drift, accessing one sample at a time. We propose Continuous Synthetic Minority Oversampling Technique (C-SMOTE), a novel rebalancing meta-strategy to pipeline with SML classification algorithms. C-SMOTE is inspired by the popular SMOTE algorithm but operates continuously. We benchmark C-SMOTE pipelines on ten different groups of data streams. We bring empirical evidence that models learnt with C-SMOTE pipelines outperform models trained on imbalanced data stream without losing the ability to deal with concept drifts. Moreover, we show that they outperform other stream balancing techniques from the literature.

C-SMOTE: Continuous Synthetic Minority Oversampling for Evolving Data Streams

Bernardo, Alessio;Gomes, Heitor Murilo;Montiel, Jacob;Pfahringer, Bernhard;Bifet, Albert;Valle, Emanuele Della

2020-01-01

Abstract

Streaming Machine Learning (SML) studies single-pass learning algorithms that update their models one data item at a time given an unbounded and often non-stationary flow of data (a.k.a., in presence of concept drift). Online class imbalance learning is a branch of SML that combines the challenges of both class imbalance and concept drift. In this paper, we investigate the binary classification problem of rebalancing an imbalanced stream of data in the presence of concept drift, accessing one sample at a time. We propose Continuous Synthetic Minority Oversampling Technique (C-SMOTE), a novel rebalancing meta-strategy to pipeline with SML classification algorithms. C-SMOTE is inspired by the popular SMOTE algorithm but operates continuously. We benchmark C-SMOTE pipelines on ten different groups of data streams. We bring empirical evidence that models learnt with C-SMOTE pipelines outperform models trained on imbalanced data stream without losing the ability to deal with concept drifts. Moreover, we show that they outperform other stream balancing techniques from the literature.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno di pubblicazione
	
				2020
			
	Titolo del libro
	
				2020 IEEE International Conference on Big Data (Big Data)
			
	Titolo della collana
	
				... IEEE INTERNATIONAL CONFERENCE ON BIG DATA
			
	ISBN (International Standard Book Number)
	
				978-1-7281-6251-5
978-1-7281-6252-2
			
	Parole chiave
	
				Streaming data;Concept Drift;Balancing;Binary Classification
			
	Appare nelle tipologie:
	
				04.1 Contributo in Atti di convegno

File in questo prodotto:

File	Dimensione	Formato
C_SMOTE_IEEE_BigData_2020 (1).pdf Accesso riservato : Publisher’s version Dimensione 269.97 kB Formato Adobe PDF Visualizza/Apri	269.97 kB	Adobe PDF	Visualizza/Apri
11311-1166189_Bernardo.pdf accesso aperto : Post-Print (DRAFT o Author’s Accepted Manuscript-AAM) Dimensione 279.31 kB Formato Adobe PDF Visualizza/Apri	279.31 kB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11311/1166189

Citazioni

ND

42

18

social impact