This paper investigates the use of second-order methods to solve Markov Decision Processes (MDPs). Despite the popularity of second-order methods in optimization literature, so far little attention has been paid to the extension of such techniques to face sequential decision problems. Here we provide a model-free Reinforcement Learning method that estimates the Newton direction by sampling directly in the parameter space. In order to compute the Newton direction we provide the formulation of the Hessian of the expected return, a technique for variance reduction in the sample-based estimation and a finite sample analysis in the case of Normal distribution. Beside discussing the theoretical properties, we empirically evaluate the method on an instructional linear-quadratic regulator and on a complex dynamical quadrotor system.

Following Newton direction in Policy Gradient with parameter exploration

MANGANINI, GIORGIO;PIROTTA, MATTEO;RESTELLI, MARCELLO;BASCETTA, LUCA
2015

Abstract

This paper investigates the use of second-order methods to solve Markov Decision Processes (MDPs). Despite the popularity of second-order methods in optimization literature, so far little attention has been paid to the extension of such techniques to face sequential decision problems. Here we provide a model-free Reinforcement Learning method that estimates the Newton direction by sampling directly in the parameter space. In order to compute the Newton direction we provide the formulation of the Hessian of the expected return, a technique for variance reduction in the sample-based estimation and a finite sample analysis in the case of Normal distribution. Beside discussing the theoretical properties, we empirically evaluate the method on an instructional linear-quadratic regulator and on a complex dynamical quadrotor system.
Proceedings of the International Joint Conference on Neural Networks
9781479919604
INF,AUT
File in questo prodotto:
File Dimensione Formato  
IJCNN2015.pdf

embargo fino al 31/01/2017

: Publisher’s version
Dimensione 1.07 MB
Formato Adobe PDF
1.07 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11311/983864
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 4
  • ???jsp.display-item.citation.isi??? 0
social impact