RE.PUBLIC@POLIMI pubblicazioni di ricerca del Politecnico di Milano

Online reinforcement learning agents are now able to process an increasing amount of data which makes their approximation and compression into value functions a more demanding task. To improve approximation, thus the learning process itself, it has been proposed to select randomly a mini-batch of the past experiences that are stored in the replay memory buffer to be replayed at each learning step. In this work, we present an algorithm that classifies and samples the experiences into separate contextual memory buffers using an unsupervised learning technique. This allows each new experience to be associated to a mini-batch of the past experiences that are not from the same contextual buffer as the current one, thus further reducing the correlation between experiences. Experimental results show that the correlation minimizing sampling improves over Q-learning algorithms with uniform sampling, and that a significant improvement can be observed when coupled with the sampling methods that prioritize on the experience temporal difference error.

Correlation minimizing replay memory in temporal-difference reinforcement learning

Ramicic M.;Bonarini A.

2020-01-01

Abstract

Online reinforcement learning agents are now able to process an increasing amount of data which makes their approximation and compression into value functions a more demanding task. To improve approximation, thus the learning process itself, it has been proposed to select randomly a mini-batch of the past experiences that are stored in the replay memory buffer to be replayed at each learning step. In this work, we present an algorithm that classifies and samples the experiences into separate contextual memory buffers using an unsupervised learning technique. This allows each new experience to be associated to a mini-batch of the past experiences that are not from the same contextual buffer as the current one, thus further reducing the correlation between experiences. Experimental results show that the correlation minimizing sampling improves over Q-learning algorithms with uniform sampling, and that a significant improvement can be observed when coupled with the sampling methods that prioritize on the experience temporal difference error.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno di pubblicazione
	
				2020
			
	Titolo della rivista
	
				NEUROCOMPUTING
			
	Parole chiave
	
				Artificial neural networks
Reinforcement learning
Replay memory
Temporal-difference learning
			
	Appare nelle tipologie:
	
				01.1 Articolo in Rivista

File in questo prodotto:

File	Dimensione	Formato
Correlation_Minimizing_Replay_Memory.pdf accesso aperto Descrizione: Main paper : Post-Print (DRAFT o Author’s Accepted Manuscript-AAM) Dimensione 1.99 MB Formato Adobe PDF Visualizza/Apri	1.99 MB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11311/1156729

Citazioni

ND

12

12

social impact