Multi-objective Markov decision processes (MOMDPs) provide an effective modeling framework for decision-making problems involving water systems. The traditional approach is to define many single-objective problems (resulting from different combinations of the objectives), each solvable by standard optimization. This paper presents an approach based on reinforcement learning (RL) that can learn the operating policies for all combinations of objectives in a single training process. The key idea is to enlarge the approximation of the action-value function, which is performed by single-objective RL over the state-action space, to the space of the objectives' weights. The batch-mode nature of the algorithm allows for enriching the training dataset without further interaction with the controlled system. The approach is demonstrated on a numerical test case study and evaluated on a real-world application, the Hoa Binh reservoir, Vietnam. Experimental results on the test case show that the proposed approach (multi-objective fitted Q-iteration; MOFQI) becomes computationally preferable over the repeated application of its single-objective version (fitted Q-iteration; FQI) when evaluating more than five weight combinations. In the Hoa Binh case study, the operating policies computed with MOFQI and FQI have comparable efficiency, while MOFQI provides a continuous approximation of the Pareto frontier with no additional computing costs.

Tree-based fitted Q-iteration for multi-objective Markov decision processes in water resource management

PIANOSI, FRANCESCA;CASTELLETTI, ANDREA FRANCESCO;RESTELLI, MARCELLO
2013

Abstract

Multi-objective Markov decision processes (MOMDPs) provide an effective modeling framework for decision-making problems involving water systems. The traditional approach is to define many single-objective problems (resulting from different combinations of the objectives), each solvable by standard optimization. This paper presents an approach based on reinforcement learning (RL) that can learn the operating policies for all combinations of objectives in a single training process. The key idea is to enlarge the approximation of the action-value function, which is performed by single-objective RL over the state-action space, to the space of the objectives' weights. The batch-mode nature of the algorithm allows for enriching the training dataset without further interaction with the controlled system. The approach is demonstrated on a numerical test case study and evaluated on a real-world application, the Hoa Binh reservoir, Vietnam. Experimental results on the test case show that the proposed approach (multi-objective fitted Q-iteration; MOFQI) becomes computationally preferable over the repeated application of its single-objective version (fitted Q-iteration; FQI) when evaluating more than five weight combinations. In the Hoa Binh case study, the operating policies computed with MOFQI and FQI have comparable efficiency, while MOFQI provides a continuous approximation of the Pareto frontier with no additional computing costs.
AUT; INF
File in questo prodotto:
File Dimensione Formato  
HYDRO-D-11-00169.pdf

Accesso riservato

: Pre-Print (o Pre-Refereeing)
Dimensione 665.97 kB
Formato Adobe PDF
665.97 kB Adobe PDF   Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11311/758944
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 27
  • ???jsp.display-item.citation.isi??? 25
social impact