RE.PUBLIC@POLIMI pubblicazioni di ricerca del Politecnico di Milano

In most Reinforcement Learning (RL) studies, the considered task is assumed to be stationary, i.e., it does not change its behavior or its characteristics over time, as this allows to generate all the convergence properties of RL techniques. Unfortunately, this assumption does not hold in real-world scenarios where systems and environments typically evolve over time. For instance, in robotic applications, sensor or actuator faults would induce a sudden change in the RL settings, while in financial applications the evolution of the market can cause a more gradual variation over time. In this paper, we present an adaptive RL algorithm able to detect changes in the environment or in the reward function and react to these changes by adapting to the new conditions of the task. At first, we develop a figure of merit onto which a hypothesis test can be applied to detect changes between two different learning iterations. Then, we extended this test to sequentially operate over time by means of the CUmulative SUM (CUSUM) approach. Finally, the proposed change-detection mechanism is combined (following an adaptive-active approach) with a well known RL algorithm to make it able to deal with non-stationary tasks. We tested the proposed algorithm on two well-known continuous-control tasks to check its effectiveness in terms of non-stationarity detection and adaptation over a vanilla RL algorithm.

Model-Free Non-Stationarity Detection and Adaptation in Reinforcement Learning

Canonaco Giuseppe;Restelli Marcello;Roveri Manuel

2020-01-01

Abstract

In most Reinforcement Learning (RL) studies, the considered task is assumed to be stationary, i.e., it does not change its behavior or its characteristics over time, as this allows to generate all the convergence properties of RL techniques. Unfortunately, this assumption does not hold in real-world scenarios where systems and environments typically evolve over time. For instance, in robotic applications, sensor or actuator faults would induce a sudden change in the RL settings, while in financial applications the evolution of the market can cause a more gradual variation over time. In this paper, we present an adaptive RL algorithm able to detect changes in the environment or in the reward function and react to these changes by adapting to the new conditions of the task. At first, we develop a figure of merit onto which a hypothesis test can be applied to detect changes between two different learning iterations. Then, we extended this test to sequentially operate over time by means of the CUmulative SUM (CUSUM) approach. Finally, the proposed change-detection mechanism is combined (following an adaptive-active approach) with a well known RL algorithm to make it able to deal with non-stationary tasks. We tested the proposed algorithm on two well-known continuous-control tasks to check its effectiveness in terms of non-stationarity detection and adaptation over a vanilla RL algorithm.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno di pubblicazione
	
				2020
			
	Titolo del libro
	
				[Proceedings of the 24th European Conference on Artificial Intelligence - ECAI 2020]
			
	Titolo della collana
	
				FRONTIERS IN ARTIFICIAL INTELLIGENCE AND APPLICATIONS
			
	Parole chiave
	
				non-stationary reinforcement learning
non-stationary environments
			
	Appare nelle tipologie:
	
				04.1 Contributo in Atti di convegno

File in questo prodotto:

File	Dimensione	Formato
11311-1146220_Canonico.pdf accesso aperto : Publisher’s version Dimensione 483.73 kB Formato Adobe PDF Visualizza/Apri	483.73 kB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11311/1146220

Citazioni

ND

10

4

social impact