In most of the transfer learning approaches to reinforcement learning (RL) the distribution over the tasks is assumed to be stationary. Therefore, the target and source tasks are i.i.d. samples of the same distribution. Unfortunately, this assumption rarely holds in real-world conditions, e.g., due to seasonality or periodicity, evolution in the environment or faults in the sensors/actuators. In the context of this work, we consider the problem of transferring value functions through a variational method when the distribution that generates the tasks is time-variant, proposing a solution that leverages this temporal structure inherent in the task generating process. Furthermore, by means of a finite-sample analysis, the previously mentioned solution is theoretically compared to its time-invariant version. Finally, the experimental evaluation of the proposed technique is carried out on the lake Como water system representing a real-world scenario and on three different RL environments with three distinct temporal dynamics.

Time-Variant Variational Transfer for Value Functions

Canonaco G.;Giuliani M.;Castelletti A.;Roveri M.;Restelli M.
2021-01-01

Abstract

In most of the transfer learning approaches to reinforcement learning (RL) the distribution over the tasks is assumed to be stationary. Therefore, the target and source tasks are i.i.d. samples of the same distribution. Unfortunately, this assumption rarely holds in real-world conditions, e.g., due to seasonality or periodicity, evolution in the environment or faults in the sensors/actuators. In the context of this work, we consider the problem of transferring value functions through a variational method when the distribution that generates the tasks is time-variant, proposing a solution that leverages this temporal structure inherent in the task generating process. Furthermore, by means of a finite-sample analysis, the previously mentioned solution is theoretically compared to its time-invariant version. Finally, the experimental evaluation of the proposed technique is carried out on the lake Como water system representing a real-world scenario and on three different RL environments with three distinct temporal dynamics.
2021
37th Conference on Uncertainty in Artificial Intelligence, UAI 2021
File in questo prodotto:
File Dimensione Formato  
canonaco21a.pdf

accesso aperto

: Publisher’s version
Dimensione 509.23 kB
Formato Adobe PDF
509.23 kB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11311/1208258
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? ND
social impact