RE.PUBLIC@POLIMI pubblicazioni di ricerca del Politecnico di Milano

How can we effectively exploit the collected samples when solving a continuous control task with Reinforcement Learning? Recent results have empirically demonstrated that multiple policy optimization steps can be performed with the same batch by using off-distribution techniques based on importance sampling. However, when dealing with off-distribution optimization, it is essential to take into account the uncertainty introduced by the importance sampling process. In this paper, we propose and analyze a class of model-free, policy search algorithms that extend the recent Policy Optimization via Importance Sampling (Metelli et al., 2018) by incorporating two advanced variance reduction techniques: per-decision and multiple importance sampling. For both of them, we derive a high-probability bound, of independent interest, and then we show how to employ it to define a suitable surrogate objective function that can be used for both action-based and parameter-based settings. The resulting algorithms are finally evaluated on a set of continuous control tasks, using both linear and deep policies, and compared with modern policy optimization methods.

Importance Sampling Techniques for Policy Optimization

Metelli Alberto Maria;Papini Matteo;Montali Nico;Restelli Marcello

2020-01-01

Abstract

How can we effectively exploit the collected samples when solving a continuous control task with Reinforcement Learning? Recent results have empirically demonstrated that multiple policy optimization steps can be performed with the same batch by using off-distribution techniques based on importance sampling. However, when dealing with off-distribution optimization, it is essential to take into account the uncertainty introduced by the importance sampling process. In this paper, we propose and analyze a class of model-free, policy search algorithms that extend the recent Policy Optimization via Importance Sampling (Metelli et al., 2018) by incorporating two advanced variance reduction techniques: per-decision and multiple importance sampling. For both of them, we derive a high-probability bound, of independent interest, and then we show how to employ it to define a suitable surrogate objective function that can be used for both action-based and parameter-based settings. The resulting algorithms are finally evaluated on a set of continuous control tasks, using both linear and deep policies, and compared with modern policy optimization methods.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno di pubblicazione
	
				2020
			
	Titolo della rivista
	
				JOURNAL OF MACHINE LEARNING RESEARCH
			
	Parole chiave
	
				Reinforcement Learning
Policy Optimization
Importance Sampling
Per-Decision Importance Sampling
Multiple Importance Sampling
			
	Appare nelle tipologie:
	
				01.1 Articolo in Rivista

File in questo prodotto:

File	Dimensione	Formato
20-124.pdf accesso aperto : Publisher’s version Dimensione 1.57 MB Formato Adobe PDF Visualizza/Apri	1.57 MB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11311/1145929

Citazioni

ND

44

36

ND

social impact