RE.PUBLIC@POLIMI pubblicazioni di ricerca del Politecnico di Milano

Online reinforcement learning agents are currently able to process an increasing amount of data by converting it into a higher order value functions. This expansion of the information collected from the environment increases the agent’s state space enabling it to scale up to more complex problems but also increases the risk of forgetting by learning on redundant or conflicting data. To improve the approximation of a large amount of data, a random mini-batch of the past experiences that are stored in the replay memory buffer is often replayed at each learning step. The proposed work takes inspiration from a biological mechanism which acts as a protective layer of higher cognitive functions found in mammalian brain: active memory consolidation mitigates the effect of forgetting previous memories by dynamically processing the new ones. Similar dynamics are implemented by the proposed augmented memory replay or AMR algorithm. The architecture of AMR, based on a simple artificial neural network is able to provide an augmentation policy which modifies each of the agents experiences by augmenting their relevance prior to storing them in the replay memory. The function approximator of AMR is evolved using genetic algorithm in order to obtain the specific augmentation policy function that yields the best performance of a learning agent in a specific environment given by its received cumulative reward. Experimental results show that an evolved AMR augmentation function capable of increasing the significance of the specific memories is able to further increase the stability and convergence speed of the learning algorithms dealing with the complexity of continuous action domains.

Augmented Memory Replay in Reinforcement Learning With Continuous Control

Ramicic M.;Bonarini A.

2021-01-01

Abstract

Online reinforcement learning agents are currently able to process an increasing amount of data by converting it into a higher order value functions. This expansion of the information collected from the environment increases the agent’s state space enabling it to scale up to more complex problems but also increases the risk of forgetting by learning on redundant or conflicting data. To improve the approximation of a large amount of data, a random mini-batch of the past experiences that are stored in the replay memory buffer is often replayed at each learning step. The proposed work takes inspiration from a biological mechanism which acts as a protective layer of higher cognitive functions found in mammalian brain: active memory consolidation mitigates the effect of forgetting previous memories by dynamically processing the new ones. Similar dynamics are implemented by the proposed augmented memory replay or AMR algorithm. The architecture of AMR, based on a simple artificial neural network is able to provide an augmentation policy which modifies each of the agents experiences by augmenting their relevance prior to storing them in the replay memory. The function approximator of AMR is evolved using genetic algorithm in order to obtain the specific augmentation policy function that yields the best performance of a learning agent in a specific environment given by its received cumulative reward. Experimental results show that an evolved AMR augmentation function capable of increasing the significance of the specific memories is able to further increase the stability and convergence speed of the learning algorithms dealing with the complexity of continuous action domains.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno di pubblicazione
	
				2021
			
	Titolo della rivista
	
				IEEE TRANSACTIONS ON COGNITIVE AND DEVELOPMENTAL SYSTEMS
			
	Parole chiave
	
				active memory consolidation
adaptive actor-critic
Biological neural networks
computational memory architectures
Computational modeling
continuous action spaces
deep neural networks.
deterministic policy gradient
Heuristic algorithms
Learning systems
Memory management
Reinforcement learning
replay memory
Task analysis
Temporal-difference learning
			
	Appare nelle tipologie:
	
				01.1 Articolo in Rivista

File in questo prodotto:

File	Dimensione	Formato
Augmented_memory_for_reinforcement_learning_with_continuous_control_IEEE.pdf accesso aperto Descrizione: AAM : Post-Print (DRAFT o Author’s Accepted Manuscript-AAM) Dimensione 1.25 MB Formato Adobe PDF Visualizza/Apri	1.25 MB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11311/1208445

Citazioni

ND

5

2

social impact