RE.PUBLIC@POLIMI pubblicazioni di ricerca del Politecnico di Milano

This paper is about solving multi-objective control problems using a model-free batch-mode reinforcement-learning approach. Although many real-world applications have several conflicting objectives, reinforcement-learning (RL) literature has mainly focused on single-objective control problems. As a consequence, in the presence of multiple objectives, the usual approach is to consider many single-objective control problems (resulting from different combinations of the original problem objectives), each one solved using standard RL techniques. The algorithm proposed in this paper is an extension of Fitted Q-iteration (FQI) that enables to learn the control policies for all the linear combinations of preferences (weights) assigned to the objectives in a single training process. The key idea of multi-objective FQI (MOFQI) is to enlarge the continuous approximation of the action-value function, which is performed by single-objective FQI over the state-action space, also to the weight space. The approach is demonstrated on an interesting real-world application for multi-objective RL algorithms: the optimal operation of a multi-purpose water reservoir.

Tree-based Fitted Q-iteration for Multi-Objective Markov Decision problems

CASTELLETTI, ANDREA FRANCESCO;PIANOSI, FRANCESCA;RESTELLI, MARCELLO

2012-01-01

Abstract

This paper is about solving multi-objective control problems using a model-free batch-mode reinforcement-learning approach. Although many real-world applications have several conflicting objectives, reinforcement-learning (RL) literature has mainly focused on single-objective control problems. As a consequence, in the presence of multiple objectives, the usual approach is to consider many single-objective control problems (resulting from different combinations of the original problem objectives), each one solved using standard RL techniques. The algorithm proposed in this paper is an extension of Fitted Q-iteration (FQI) that enables to learn the control policies for all the linear combinations of preferences (weights) assigned to the objectives in a single training process. The key idea of multi-objective FQI (MOFQI) is to enlarge the continuous approximation of the action-value function, which is performed by single-objective FQI over the state-action space, also to the weight space. The approach is demonstrated on an interesting real-world application for multi-objective RL algorithms: the optimal operation of a multi-purpose water reservoir.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno di pubblicazione
	
				2012
			
	Titolo del libro
	
				2012 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN)
			
	Titolo della collana
	
				PROCEEDINGS OF ... INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS
			
	ISBN (International Standard Book Number)
	
				9781467314886
9781467314893
9781467314909
			
	Appare nelle tipologie:
	
				04.1 Contributo in Atti di convegno

File in questo prodotto:

File	Dimensione	Formato
06252759.pdf accesso aperto Descrizione: Articolo principale : Publisher’s version Dimensione 1.86 MB Formato Adobe PDF Visualizza/Apri	1.86 MB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11311/690408

Citazioni

ND

42

18

social impact