The world is constantly changing, and so are the massive amount of data produced. However, only a few studies deal with online class imbalance learning that combines the challenges of class-imbalanced data streams and concept drift. In this paper, we propose the very fast continuous synthetic minority oversampling technique (VFC-SMOTE). It is a novel meta-strategy to be prepended to any streaming machine learning classification algorithm aiming at oversampling the minority class using a new version of SMOTE and BORDERLINE-SMOTE inspired by Data Sketching. We benchmarked VFC-SMOTE pipelines on synthetic and real data streams containing different concept drifts, imbalance levels, and class distributions. We bring statistical evidence that VFC-SMOTE pipelines learn models whose minority class performances are better than state-of-the-art. Moreover, we analyze the time/memory consumption and the concept drift recovery speed.
VFC-SMOTE: very fast continuous synthetic minority oversampling for evolving data streams
Bernardo, Alessio;Della Valle, Emanuele
2021-01-01
Abstract
The world is constantly changing, and so are the massive amount of data produced. However, only a few studies deal with online class imbalance learning that combines the challenges of class-imbalanced data streams and concept drift. In this paper, we propose the very fast continuous synthetic minority oversampling technique (VFC-SMOTE). It is a novel meta-strategy to be prepended to any streaming machine learning classification algorithm aiming at oversampling the minority class using a new version of SMOTE and BORDERLINE-SMOTE inspired by Data Sketching. We benchmarked VFC-SMOTE pipelines on synthetic and real data streams containing different concept drifts, imbalance levels, and class distributions. We bring statistical evidence that VFC-SMOTE pipelines learn models whose minority class performances are better than state-of-the-art. Moreover, we analyze the time/memory consumption and the concept drift recovery speed.File | Dimensione | Formato | |
---|---|---|---|
VFC_SMOTE_Data_Mining_and_Knowledge_Discovery_journal_SI__ECMLPKDD2021.pdf
accesso aperto
Descrizione: Articolo principale
:
Post-Print (DRAFT o Author’s Accepted Manuscript-AAM)
Dimensione
4.87 MB
Formato
Adobe PDF
|
4.87 MB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.