RE.PUBLIC@POLIMI pubblicazioni di ricerca del Politecnico di Milano

Policy search is one of the most effective reinforcement learning classes of methods for solving continuous control tasks. These methodologies attempt to find a good policy for an agent by fixing a family of parametric policies and then searching directly for the parameters that optimize the long-term reward. However, this parametric policy space represents just a subset of all possible Markovian policies, and finding a good parametrization for a given task is a challenging problem in its own right, typically left to human expertise. In this paper, we propose a novel, model-free, adaptive-space policy search algorithm, GAPS (Gradient-based Adaptive Policy Search). We start from a simple policy space; once we have found a good policy within this policy space, based on the observations we receive from the unknown environment, we evaluate the possibility of expanding the policy space. Iterating this process, we obtain a parametric policy whose structure (including the number of parameters) is fitted to the problem at hand without any prior knowledge of the task. Finally, our algorithm is tested on a selection of continuous control tasks, evaluating the learning process with adaptive policy spaces and comparing the results with traditional policy optimization methods that employ a fixed policy space.

Search or split: policy gradient with adaptive policy space

Tedeschi, Gianmarco;Papini, Matteo;Metelli, Alberto Maria;Restelli, Marcello

2025-01-01

Abstract

Policy search is one of the most effective reinforcement learning classes of methods for solving continuous control tasks. These methodologies attempt to find a good policy for an agent by fixing a family of parametric policies and then searching directly for the parameters that optimize the long-term reward. However, this parametric policy space represents just a subset of all possible Markovian policies, and finding a good parametrization for a given task is a challenging problem in its own right, typically left to human expertise. In this paper, we propose a novel, model-free, adaptive-space policy search algorithm, GAPS (Gradient-based Adaptive Policy Search). We start from a simple policy space; once we have found a good policy within this policy space, based on the observations we receive from the unknown environment, we evaluate the possibility of expanding the policy space. Iterating this process, we obtain a parametric policy whose structure (including the number of parameters) is fitted to the problem at hand without any prior knowledge of the task. Finally, our algorithm is tested on a selection of continuous control tasks, evaluating the learning process with adaptive policy spaces and comparing the results with traditional policy optimization methods that employ a fixed policy space.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno di pubblicazione
	
				2025
			
	Titolo della rivista
	
				MACHINE LEARNING
			
	Appare nelle tipologie:
	
				01.1 Articolo in Rivista

File in questo prodotto:

File	Dimensione	Formato
s10994-025-06820-2.pdf Accesso riservato : Publisher’s version Dimensione 2.47 MB Formato Adobe PDF Visualizza/Apri	2.47 MB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11311/1298939

Citazioni

ND

1

0

social impact