Distribution drift is an important issue for practical applications of machine learning (ML). In particular, in streaming ML, the data distribution may change over time, yielding the problem of concept drift, which affects the performance of learners trained with outdated data. In this article, we focus on supervised problems in an online nonstationary setting, introducing a novel learner-agnostic algorithm for drift adaptation, namely (IWDA), with the goal of performing efficient retraining of the learner when drift is detected. IWDA incrementally estimates the joint probability density of input and target for the incoming data and, as soon as drift is detected, retrains the learner using importance-weighted empirical risk minimization. The importance weights are computed for all the samples observed so far, employing the estimated densities, thus, using all available information efficiently. After presenting our approach, we provide a theoretical analysis in the abrupt drift setting. Finally, we present numerical simulations that illustrate how IWDA competes and often outperforms state-of-the-art stream learning techniques, including adaptive ensemble methods, on both synthetic and real-world data benchmarks.

IWDA: Importance Weighting for Drift Adaptation in Streaming Supervised Learning Problems

Fedeli Filippo;Metelli Alberto Maria;Trovo' Francesco;Restelli Marcello
2023-01-01

Abstract

Distribution drift is an important issue for practical applications of machine learning (ML). In particular, in streaming ML, the data distribution may change over time, yielding the problem of concept drift, which affects the performance of learners trained with outdated data. In this article, we focus on supervised problems in an online nonstationary setting, introducing a novel learner-agnostic algorithm for drift adaptation, namely (IWDA), with the goal of performing efficient retraining of the learner when drift is detected. IWDA incrementally estimates the joint probability density of input and target for the incoming data and, as soon as drift is detected, retrains the learner using importance-weighted empirical risk minimization. The importance weights are computed for all the samples observed so far, employing the estimated densities, thus, using all available information efficiently. After presenting our approach, we provide a theoretical analysis in the abrupt drift setting. Finally, we present numerical simulations that illustrate how IWDA competes and often outperforms state-of-the-art stream learning techniques, including adaptive ensemble methods, on both synthetic and real-world data benchmarks.
2023
concept drift
data drift
drift adaptation
importance weighting (IW)
nonstationarity
stream learning
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11311/1236403
Citazioni
  • ???jsp.display-item.citation.pmc??? 0
  • Scopus 2
  • ???jsp.display-item.citation.isi??? 1
social impact