RE.PUBLIC@POLIMI pubblicazioni di ricerca del Politecnico di Milano

Online reinforcement agents take advantage of experience replay memory that allows them to reuse experiences from the past to re-learn, thus improving the overall efficiency of the learning process. Prioritizing on specific transitions during the sampling and replay increased the performance of learning even more, but in previous approaches the priority of the transitions was determined only by its TD error property. In this work, we introduce a novel criterion for evaluating the importance of the transition which is based on the Shannon's entropy of the agents perceived state space. Furthermore, we compare the performance of different criteria for prioritizing on one of the simulation environments included in REinforcejs framework. Experimental results show that DQ-ETD which uses a combination of entropy and TD error criterion outperforms the approaches based on the TD error criterion only such as DQ-TD.

Entropy-based prioritized sampling in Deep Q-learning

RAMICIC, MIRZA;BONARINI, ANDREA

2017-01-01

Abstract

Online reinforcement agents take advantage of experience replay memory that allows them to reuse experiences from the past to re-learn, thus improving the overall efficiency of the learning process. Prioritizing on specific transitions during the sampling and replay increased the performance of learning even more, but in previous approaches the priority of the transitions was determined only by its TD error property. In this work, we introduce a novel criterion for evaluating the importance of the transition which is based on the Shannon's entropy of the agents perceived state space. Furthermore, we compare the performance of different criteria for prioritizing on one of the simulation environments included in REinforcejs framework. Experimental results show that DQ-ETD which uses a combination of entropy and TD error criterion outperforms the approaches based on the TD error criterion only such as DQ-TD.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno di pubblicazione
	
				2017
			
	Titolo del libro
	
				Proceedings of the IEEE
			
	ISBN (International Standard Book Number)
	
				978-1-5090-6238-6
			
	Parole chiave
	
				Reinforcement learning
			
	Parole chiave
	
				Markov Decision Process
Entropy
Neural Networks
Autonomous agents
			
	Appare nelle tipologie:
	
				04.1 Contributo in Atti di convegno

File in questo prodotto:

File	Dimensione	Formato
V414.pdf accesso aperto : Publisher’s version Dimensione 691.74 kB Formato Adobe PDF Visualizza/Apri	691.74 kB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11311/1031981

Citazioni

ND

13

10

social impact