Policy search is one of the most effective reinforcement learning classes of methods for solving continuous control tasks. These methodologies attempt to find a good policy for an agent by fixing a family of parametric policies and then searching directly for the parameters that optimize the long-term reward. However, this parametric policy space represents just a subset of all possible Markovian policies, and finding a good parametrization for a given task is a challenging problem in its own right, typically left to human expertise. In this paper, we propose a novel, model-free, adaptive-space policy search algorithm, GAPS (Gradient-based Adaptive Policy Search). We start from a simple policy space; once we have found a good policy within this policy space, based on the observations we receive from the unknown environment, we evaluate the possibility of expanding the policy space. Iterating this process, we obtain a parametric policy whose structure (including the number of parameters) is fitted to the problem at hand without any prior knowledge of the task. Finally, our algorithm is tested on a selection of continuous control tasks, evaluating the learning process with adaptive policy spaces and comparing the results with traditional policy optimization methods that employ a fixed policy space.

Search or split: policy gradient with adaptive policy space

Tedeschi, Gianmarco;Papini, Matteo;Metelli, Alberto Maria;Restelli, Marcello
2025-01-01

Abstract

Policy search is one of the most effective reinforcement learning classes of methods for solving continuous control tasks. These methodologies attempt to find a good policy for an agent by fixing a family of parametric policies and then searching directly for the parameters that optimize the long-term reward. However, this parametric policy space represents just a subset of all possible Markovian policies, and finding a good parametrization for a given task is a challenging problem in its own right, typically left to human expertise. In this paper, we propose a novel, model-free, adaptive-space policy search algorithm, GAPS (Gradient-based Adaptive Policy Search). We start from a simple policy space; once we have found a good policy within this policy space, based on the observations we receive from the unknown environment, we evaluate the possibility of expanding the policy space. Iterating this process, we obtain a parametric policy whose structure (including the number of parameters) is fitted to the problem at hand without any prior knowledge of the task. Finally, our algorithm is tested on a selection of continuous control tasks, evaluating the learning process with adaptive policy spaces and comparing the results with traditional policy optimization methods that employ a fixed policy space.
2025
File in questo prodotto:
File Dimensione Formato  
s10994-025-06820-2.pdf

Accesso riservato

: Publisher’s version
Dimensione 2.47 MB
Formato Adobe PDF
2.47 MB Adobe PDF   Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11311/1298939
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? 0
social impact