RE.PUBLIC@POLIMI pubblicazioni di ricerca del Politecnico di Milano

This paper considers sequential decision-making problems where the interactions between an agent and its environment are affected by delays. Delays may be present in the state observation, in the action execution, or in the reward collection. We consider the delayed Markov Decision Process (MDP) framework both in the case of deterministic and stochastic delays. Given the hardness of the delayed MDP problem, we use a heuristic approach to design an algorithm that uses the belief over the current unobserved state to select its action. We design a self-attention prediction module which, given the last observed state and the following sequence of actions, estimates the beliefs over the following states. This algorithm is able to deal with deterministic delays and could potentially be extended to stochastic delays. We empirically evaluate the effectiveness of the proposed approach in both deterministic and stochastic control problems affected by deterministic delays.

Learning a Belief Representation for Delayed Reinforcement Learning

Liotet P.;Venneri E.;Restelli M.

2021-01-01

Abstract

This paper considers sequential decision-making problems where the interactions between an agent and its environment are affected by delays. Delays may be present in the state observation, in the action execution, or in the reward collection. We consider the delayed Markov Decision Process (MDP) framework both in the case of deterministic and stochastic delays. Given the hardness of the delayed MDP problem, we use a heuristic approach to design an algorithm that uses the belief over the current unobserved state to select its action. We design a self-attention prediction module which, given the last observed state and the following sequence of actions, estimates the beliefs over the following states. This algorithm is able to deal with deterministic delays and could potentially be extended to stochastic delays. We empirically evaluate the effectiveness of the proposed approach in both deterministic and stochastic control problems affected by deterministic delays.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno di pubblicazione
	
				2021
			
	Titolo del libro
	
				Proceedings of the International Joint Conference on Neural Networks
			
	Titolo della collana
	
				PROCEEDINGS OF ... INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS
			
	ISBN (International Standard Book Number)
	
				978-1-6654-3900-8
			
	Parole chiave
	
				belief
delays
masked autoregressive flows
reinforcement learning
self-attention network
			
	Appare nelle tipologie:
	
				04.1 Contributo in Atti di convegno

File in questo prodotto:

File	Dimensione	Formato
Learning_a_Belief_Representation_for_Delayed_Reinforcement_Learning.pdf Accesso riservato : Publisher’s version Dimensione 6 MB Formato Adobe PDF Visualizza/Apri	6 MB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11311/1208256

Citazioni

ND

9

0

social impact