Reinforcement learning has proven to be successful in obtaining profitable trading policies; however, the effectiveness of such strategies is strongly conditioned to market stationarity. This hypothesis is challenged by the regime switches frequently experienced by practitioners; thus, when many models are available, validation may become a difficult task. We propose to overcome the issue by explicitly modeling the trading task as a non-stationary reinforcement learning problem. Nevertheless, state-of-the-art RL algorithms for this setting usually require task distribution or dynamics to be predictable, an assumption that can hardly be true in the financial framework. In this work, we propose, instead, a method for the dynamic selection of the best RL agent which is only driven by profit performance. Our modular two-layer approach allows choosing the best strategy among a set of RL models through an online-learning algorithm. While we could select any combination of algorithms in principle, our solution employs two state-of-the-art algorithms: Fitted Q-Iteration (FQI) for the RL layer and Optimistic Adapt ML-Prod (OAMP) for the online learning one. The proposed approach is tested on two simulated FX trading tasks, using actual historical data for the AUS/USD and GBP/USD currency pairs.
Addressing Non-Stationarity in FX Trading with Online Model Selection of Offline RL Experts
Riva A.;Bisi L.;Liotet P.;Sabbioni L.;Vittori E.;Restelli M.
2022-01-01
Abstract
Reinforcement learning has proven to be successful in obtaining profitable trading policies; however, the effectiveness of such strategies is strongly conditioned to market stationarity. This hypothesis is challenged by the regime switches frequently experienced by practitioners; thus, when many models are available, validation may become a difficult task. We propose to overcome the issue by explicitly modeling the trading task as a non-stationary reinforcement learning problem. Nevertheless, state-of-the-art RL algorithms for this setting usually require task distribution or dynamics to be predictable, an assumption that can hardly be true in the financial framework. In this work, we propose, instead, a method for the dynamic selection of the best RL agent which is only driven by profit performance. Our modular two-layer approach allows choosing the best strategy among a set of RL models through an online-learning algorithm. While we could select any combination of algorithms in principle, our solution employs two state-of-the-art algorithms: Fitted Q-Iteration (FQI) for the RL layer and Optimistic Adapt ML-Prod (OAMP) for the online learning one. The proposed approach is tested on two simulated FX trading tasks, using actual historical data for the AUS/USD and GBP/USD currency pairs.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.