Building Surrogate Models Using Trajectories of Agents Trained by Reinforcement Learning

Cestero, J.; Quartulli, M.; Restelli, M.

doi:10.1007/978-3-031-72341-4_23

Sample efficiency in the face of computationally expensive simulations is a common concern in surrogate modeling. Current strategies to minimize the number of samples needed are not as effective in simulated environments with wide state spaces. As a response to this challenge, we propose a novel method to efficiently sample simulated deterministic environments by using policies trained by Reinforcement Learning. We provide an extensive analysis of these surrogate-building strategies with respect to Latin-Hypercube sampling or Active Learning and Kriging, cross-validating performances with all sampled datasets. The analysis shows that a mixed dataset that includes samples acquired by random agents, expert agents, and agents trained to explore the regions of maximum entropy of the state transition distribution provides the best scores through all datasets, which is crucial for a meaningful state space representation. We conclude that the proposed method improves the state-of-the-art and clears the path to enable the application of surrogate-aided Reinforcement Learning policy optimization strategies on complex simulators.

Building Surrogate Models Using Trajectories of Agents Trained by Reinforcement Learning

Cestero J.;Quartulli M.;Restelli M.

2024-01-01

Abstract

Sample efficiency in the face of computationally expensive simulations is a common concern in surrogate modeling. Current strategies to minimize the number of samples needed are not as effective in simulated environments with wide state spaces. As a response to this challenge, we propose a novel method to efficiently sample simulated deterministic environments by using policies trained by Reinforcement Learning. We provide an extensive analysis of these surrogate-building strategies with respect to Latin-Hypercube sampling or Active Learning and Kriging, cross-validating performances with all sampled datasets. The analysis shows that a mixed dataset that includes samples acquired by random agents, expert agents, and agents trained to explore the regions of maximum entropy of the state transition distribution provides the best scores through all datasets, which is crucial for a meaningful state space representation. We conclude that the proposed method improves the state-of-the-art and clears the path to enable the application of surrogate-aided Reinforcement Learning policy optimization strategies on complex simulators.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno di pubblicazione
	
				2024
			
	Titolo del libro
	
				Artificial Neural Networks and Machine Learning – ICANN 2024
			
	Titolo della collana
	
				LECTURE NOTES IN COMPUTER SCIENCE
			
	ISBN (International Standard Book Number)
	
				9783031723407
9783031723414
			
	Parole chiave
	
				Reinforcement Learning
Surrogate models
Sampling
Entropy maximization
			
	Appare nelle tipologie:
	
				04.1 Contributo in Atti di convegno

File in questo prodotto:

File	Dimensione	Formato
building2024cestero.pdf Accesso riservato : Publisher’s version Dimensione 17.52 MB Formato Adobe PDF Visualizza/Apri	17.52 MB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11311/1288625

Citazioni

ND

2

1

RE.PUBLIC@POLIMI pubblicazioni di ricerca del Politecnico di Milano

Building Surrogate Models Using Trajectories of Agents Trained by Reinforcement Learning

Cestero J.;Quartulli M.;Restelli M.

2024-01-01

Abstract

Scheda breve

Scheda completa

Scheda completa (DC)

Citazioni

social impact

RE.PUBLIC@POLIMI pubblicazioni di ricerca del Politecnico di Milano

Building Surrogate Models Using Trajectories of Agents Trained by Reinforcement Learning

Cestero J.;Quartulli M.;Restelli M.

2024-01-01

Abstract

Scheda breve Scheda completa Scheda completa (DC)

Informazioni

Citazioni

social impact

Conferma cancellazione

Scheda breve

Scheda completa

Scheda completa (DC)