RE.PUBLIC@POLIMI pubblicazioni di ricerca del Politecnico di Milano

In Deep Reinforcement Learning (DRL), agents learn by sampling transitions from a batch of stored data called Experience Replay. In most DRL algorithms, the Experience Replay is filled by experiences gathered by the learning agent itself. However, agents that are trained completely Off-Policy, based on experiences gathered by behaviors that are completely decoupled from their own, cannot learn to improve their own policies. In general, the more algorithms train agents Off-Policy, the more they become prone to divergence. The main contribution of this research is the proposal of a novel learning framework called Policy Feedback, used both as a tool to leverage offline-collected expert experiences, and also as a general framework to improve the understanding of the issues behind Off-Policy Learning.

Policy Feedback in Deep Reinforcement Learning to Exploit Expert Knowledge

Espositi, Federico;Bonarini, Andrea

2021-01-01

Abstract

In Deep Reinforcement Learning (DRL), agents learn by sampling transitions from a batch of stored data called Experience Replay. In most DRL algorithms, the Experience Replay is filled by experiences gathered by the learning agent itself. However, agents that are trained completely Off-Policy, based on experiences gathered by behaviors that are completely decoupled from their own, cannot learn to improve their own policies. In general, the more algorithms train agents Off-Policy, the more they become prone to divergence. The main contribution of this research is the proposal of a novel learning framework called Policy Feedback, used both as a tool to leverage offline-collected expert experiences, and also as a general framework to improve the understanding of the issues behind Off-Policy Learning.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno di pubblicazione
	
				2021
			
	Titolo del libro
	
				Machine Learning, Optimization, and Data Science
			
	Titolo della collana
	
				LECTURE NOTES IN COMPUTER SCIENCE
			
	ISBN (International Standard Book Number)
	
				978-3-030-64583-0
			
	Parole chiave
	
				Experience replay
Policy feedback
			
	Parole chiave
	
				Deep Reinforcement Learning
			
	Appare nelle tipologie:
	
				04.1 Contributo in Atti di convegno

File in questo prodotto:

File	Dimensione	Formato
Abstract_2.pdf Open Access dal 09/01/2022 Descrizione: Articolo principale : Post-Print (DRAFT o Author’s Accepted Manuscript-AAM) Dimensione 1.2 MB Formato Adobe PDF Visualizza/Apri	1.2 MB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11311/1170902

Citazioni

ND

0

ND

social impact