Heavy-tailed distributions naturally arise in several settings, from finance to telecommunications. While regret minimization under subgaussian or bounded rewards has been widely studied, learning with heavy-tailed distributions only gained popularity over the last decade. In this paper, we consider the setting in which the reward distributions have finite absolute raw moments of maximum order 1 + ϵ, uniformly bounded by a constant u < +∞, for some ϵ ∈ (0, 1]. In this setting, we study the regret minimization problem when ϵ and u are unknown to the learner and it has to adapt. First, we show that adaptation comes at a cost and derive two negative results proving that the same regret guarantees of the non-adaptive case cannot be achieved with no further assumptions. Then, we devise and analyze a fully data-driven trimmed mean estimator and propose a novel adaptive regret minimization algorithm, AdaR-UCB, that leverages such an estimator. Finally, we show that AdaR-UCB is the first algorithm that, under a known distributional assumption, enjoys regret guarantees nearly matching those of the non-adaptive heavy-tailed case.

(ϵ, u)-Adaptive Regret Minimization in Heavy-Tailed Bandits

Genalti G.;Gatti N.;Metelli A. M.
2024-01-01

Abstract

Heavy-tailed distributions naturally arise in several settings, from finance to telecommunications. While regret minimization under subgaussian or bounded rewards has been widely studied, learning with heavy-tailed distributions only gained popularity over the last decade. In this paper, we consider the setting in which the reward distributions have finite absolute raw moments of maximum order 1 + ϵ, uniformly bounded by a constant u < +∞, for some ϵ ∈ (0, 1]. In this setting, we study the regret minimization problem when ϵ and u are unknown to the learner and it has to adapt. First, we show that adaptation comes at a cost and derive two negative results proving that the same regret guarantees of the non-adaptive case cannot be achieved with no further assumptions. Then, we devise and analyze a fully data-driven trimmed mean estimator and propose a novel adaptive regret minimization algorithm, AdaR-UCB, that leverages such an estimator. Finally, we show that AdaR-UCB is the first algorithm that, under a known distributional assumption, enjoys regret guarantees nearly matching those of the non-adaptive heavy-tailed case.
2024
Proceedings of Machine Learning Research
adaptivity
bandits
heavy-tailed distributions
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11311/1286485
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 5
  • ???jsp.display-item.citation.isi??? ND
social impact