The world is constantly changing, and so are the massive amount of data produced. However, only a few studies deal with online class imbalance learning that combines the challenges of class-imbalanced data streams and concept drift. In this paper, we propose the Synthetic Minority Oversampling TEchnique with Online Bagging (SMOTE-OB). It is a novel cost-sensitive ensemble strategy that uses Online Bagging and a new sketched version of SMOTE to over/undersample the minority and majority classes. We benchmarked SMOTE-OB on synthetic and real data streams containing different concept drifts, imbalance levels, and class distributions. We bring statistical evidence that the SMOTE-OB ensemble achieves minority class performance that are better than the state-of-the-art ones. Moreover, we perform a time/memory consumption analysis.
SMOTE-OB: Combining SMOTE and Online Bagging for Continuous Rebalancing of Evolving Data Streams
Bernardo, Alessio;Valle, Emanuele Della
2021-01-01
Abstract
The world is constantly changing, and so are the massive amount of data produced. However, only a few studies deal with online class imbalance learning that combines the challenges of class-imbalanced data streams and concept drift. In this paper, we propose the Synthetic Minority Oversampling TEchnique with Online Bagging (SMOTE-OB). It is a novel cost-sensitive ensemble strategy that uses Online Bagging and a new sketched version of SMOTE to over/undersample the minority and majority classes. We benchmarked SMOTE-OB on synthetic and real data streams containing different concept drifts, imbalance levels, and class distributions. We bring statistical evidence that the SMOTE-OB ensemble achieves minority class performance that are better than the state-of-the-art ones. Moreover, we perform a time/memory consumption analysis.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.