In real-world problems such as robotics, finance and healthcare, randomness is always present, thus, it is important to take risk into consideration in order to limit the chance of rare but dangerous events. The literature on risk-averse reinforcement learning has produced many different approaches to tackle the problem, but they either struggle to scale up to complex instances, or they exhibit irrational behaviors. Here we present two novel risk-averse objectives that are both coherent and easy to optimize: the reward-based mean-mean absolute deviation (Mean-RMAD) and the reward-based conditional value at risk (RCVaR). Instead of reducing the return risk, these measures minimize the per-step reward one. We prove that these risk measures bound the corresponding return-based risk measures, so that they can be also used as proxies for their return-based versions. We develop safe algorithms for these risk measures with guaranteed monotonic improvement, and their practical trust-region versions. Furthermore, we propose a decomposition for the RCVaR optimization problem into a sequence of risk-neutral problems. Finally, we conduct an empirical analysis on the introduced approaches, demonstrating their effectiveness in retrieving a variety of risk-averse behaviors on both toy problems and more challenging ones, such as a simulated trading environment and robotic locomotion tasks.

Risk-averse optimization of reward-based coherent risk measures

Bisi L.;Restelli M.
2023-01-01

Abstract

In real-world problems such as robotics, finance and healthcare, randomness is always present, thus, it is important to take risk into consideration in order to limit the chance of rare but dangerous events. The literature on risk-averse reinforcement learning has produced many different approaches to tackle the problem, but they either struggle to scale up to complex instances, or they exhibit irrational behaviors. Here we present two novel risk-averse objectives that are both coherent and easy to optimize: the reward-based mean-mean absolute deviation (Mean-RMAD) and the reward-based conditional value at risk (RCVaR). Instead of reducing the return risk, these measures minimize the per-step reward one. We prove that these risk measures bound the corresponding return-based risk measures, so that they can be also used as proxies for their return-based versions. We develop safe algorithms for these risk measures with guaranteed monotonic improvement, and their practical trust-region versions. Furthermore, we propose a decomposition for the RCVaR optimization problem into a sequence of risk-neutral problems. Finally, we conduct an empirical analysis on the introduced approaches, demonstrating their effectiveness in retrieving a variety of risk-averse behaviors on both toy problems and more challenging ones, such as a simulated trading environment and robotic locomotion tasks.
2023
Coherent risk measure
CVaR
Mean absolute deviation
Reinforcement learning
Reward-based measure
Risk-averse
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11311/1261002
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 1
  • ???jsp.display-item.citation.isi??? ND
social impact