RE.PUBLIC@POLIMI pubblicazioni di ricerca del Politecnico di Milano

The (expected cumulative) regret is the customary index to judge the performance of online sequential decision-making algorithms. In the traditional form, it is defined as the expected sum over a learning horizon T of the sub-optimality gaps triangle It (i.e., expected instantaneous regret) the agent suffers when playing arm It at round t. In this paper, we propose and investigate a generalization of this notion, named g-(expected cumulative) regret, obtained by applying a transformation function g to the sub-optimality gaps, making the agent suffer g(triangle(It)) instead of just triangle(It). Intuitively, function g embeds the "perception" that the agent manifests when performing a sub-optimal decision. We first show that sublinear g-regret is not achievable for a generic transformation function g. Then, we introduce a mild condition on g and provide instance-dependent and worst-case (i.e., minimax) lower bounds for the g-regret. Finally, we show that state-of-the-art stochastic bandit algorithms with no modification surprisingly display optimal performances for the g-regret. Specifically, we prove that UCB1 matches (up to constant factors) the instance-dependent lower bound regardless of function g and that MOSS matches (up to constant factors) the minimax lower bound at least for a wide class of transformation functions.

Generalizing the Regret: an Analysis of Lower and Upper Bounds

Mussi M.;Metelli A. M.

2025-01-01

Abstract

The (expected cumulative) regret is the customary index to judge the performance of online sequential decision-making algorithms. In the traditional form, it is defined as the expected sum over a learning horizon T of the sub-optimality gaps triangle It (i.e., expected instantaneous regret) the agent suffers when playing arm It at round t. In this paper, we propose and investigate a generalization of this notion, named g-(expected cumulative) regret, obtained by applying a transformation function g to the sub-optimality gaps, making the agent suffer g(triangle(It)) instead of just triangle(It). Intuitively, function g embeds the "perception" that the agent manifests when performing a sub-optimal decision. We first show that sublinear g-regret is not achievable for a generic transformation function g. Then, we introduce a mild condition on g and provide instance-dependent and worst-case (i.e., minimax) lower bounds for the g-regret. Finally, we show that state-of-the-art stochastic bandit algorithms with no modification surprisingly display optimal performances for the g-regret. Specifically, we prove that UCB1 matches (up to constant factors) the instance-dependent lower bound regardless of function g and that MOSS matches (up to constant factors) the minimax lower bound at least for a wide class of transformation functions.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno di pubblicazione
	
				2025
			
	Titolo della rivista
	
				THE JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH
			
	Appare nelle tipologie:
	
				01.1 Articolo in Rivista

File in questo prodotto:

File	Dimensione	Formato
paper.pdf accesso aperto Descrizione: Paper : Publisher’s version Dimensione 649.25 kB Formato Adobe PDF Visualizza/Apri	649.25 kB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11311/1307647

Citazioni

ND

1

1

social impact