System identification and optimal control have always contributed to water resources systems planning and management. Although water control problems are commonly formulated as multi-objective Markov Decision Processes, accurately modeling reservoir systems controlled by human operators remains challenging due to the absence of a formal definition of the objective function guiding their behavior. In this letter, we introduce a mixed Reinforcement Learning approach to model the dynamics of multipurpose reservoir systems. Specifically, our method first uses Inverse Reinforcement Learning to extract the tradeoff among competing objectives from historical observations of the reservoir system dynamics. The identified objective function is then used in the formulation of an optimal control problem returning a closed-loop policy which allows the simulation of the observed dynamics of the reservoir system. We demonstrate the potential of the proposed method in a real-world application involving the multipurpose regulation of Lake Como in northern Italy. Results show that our approach effectively infers the tradeoff between flood control and water supply adopted in the observed system's operation, and yields a control policy that closely approximates the observed system dynamics.

Integrating Inverse Reinforcement Learning and Direct Policy Search for Modeling Multipurpose Water Reservoir Systems

Giuliani, Matteo;Castelletti, Andrea
2024-01-01

Abstract

System identification and optimal control have always contributed to water resources systems planning and management. Although water control problems are commonly formulated as multi-objective Markov Decision Processes, accurately modeling reservoir systems controlled by human operators remains challenging due to the absence of a formal definition of the objective function guiding their behavior. In this letter, we introduce a mixed Reinforcement Learning approach to model the dynamics of multipurpose reservoir systems. Specifically, our method first uses Inverse Reinforcement Learning to extract the tradeoff among competing objectives from historical observations of the reservoir system dynamics. The identified objective function is then used in the formulation of an optimal control problem returning a closed-loop policy which allows the simulation of the observed dynamics of the reservoir system. We demonstrate the potential of the proposed method in a real-world application involving the multipurpose regulation of Lake Como in northern Italy. Results show that our approach effectively infers the tradeoff between flood control and water supply adopted in the observed system's operation, and yields a control policy that closely approximates the observed system dynamics.
2024
File in questo prodotto:
File Dimensione Formato  
Giuliani2024_IRL.pdf

accesso aperto

: Publisher’s version
Dimensione 1.33 MB
Formato Adobe PDF
1.33 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11311/1275666
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? ND
social impact