This paper is about solving multi-objective control problems using a model-free batch-mode reinforcement-learning approach. Although many real-world applications have several conflicting objectives, reinforcement-learning (RL) literature has mainly focused on single-objective control problems. As a consequence, in the presence of multiple objectives, the usual approach is to consider many single-objective control problems (resulting from different combinations of the original problem objectives), each one solved using standard RL techniques. The algorithm proposed in this paper is an extension of Fitted Q-iteration (FQI) that enables to learn the control policies for all the linear combinations of preferences (weights) assigned to the objectives in a single training process. The key idea of multi-objective FQI (MOFQI) is to enlarge the continuous approximation of the action-value function, which is performed by single-objective FQI over the state-action space, also to the weight space. The approach is demonstrated on an interesting real-world application for multi-objective RL algorithms: the optimal operation of a multi-purpose water reservoir.

Tree-based Fitted Q-iteration for Multi-Objective Markov Decision problems

CASTELLETTI, ANDREA FRANCESCO;PIANOSI, FRANCESCA;RESTELLI, MARCELLO
2012-01-01

Abstract

This paper is about solving multi-objective control problems using a model-free batch-mode reinforcement-learning approach. Although many real-world applications have several conflicting objectives, reinforcement-learning (RL) literature has mainly focused on single-objective control problems. As a consequence, in the presence of multiple objectives, the usual approach is to consider many single-objective control problems (resulting from different combinations of the original problem objectives), each one solved using standard RL techniques. The algorithm proposed in this paper is an extension of Fitted Q-iteration (FQI) that enables to learn the control policies for all the linear combinations of preferences (weights) assigned to the objectives in a single training process. The key idea of multi-objective FQI (MOFQI) is to enlarge the continuous approximation of the action-value function, which is performed by single-objective FQI over the state-action space, also to the weight space. The approach is demonstrated on an interesting real-world application for multi-objective RL algorithms: the optimal operation of a multi-purpose water reservoir.
2012
2012 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN)
9781467314886
9781467314893
9781467314909
File in questo prodotto:
File Dimensione Formato  
06252759.pdf

accesso aperto

Descrizione: Articolo principale
: Publisher’s version
Dimensione 1.86 MB
Formato Adobe PDF
1.86 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11311/690408
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 42
  • ???jsp.display-item.citation.isi??? 18
social impact