Although it is well-known that humans commonly engage in risk-sensitive behaviors in the presence of stochasticity, most Inverse Reinforcement Learning (IRL) models assume a risk-neutral agent. As such, beyond piq introducing model misspecification, piiq they do not permit direct inference of the risk attitude of the observed agent, which can be useful in many applications. In this paper, we propose a novel model of behavior to cope with these issues. By allowing for risk sensitivity, our model alleviates piq, and by explicitly representing risk attitudes through (learnable) utility functions, it solves piiq. Then, we characterize the partial identifiability of an agent’s utility under the new model and note that demonstrations from multiple environments mitigate the problem. We devise two provably-efficient algorithms for learning utilities in a finite-data regime, and we conclude with some proof-of-concept experiments to validate both our model and our algorithms.
Learning Utilities from Demonstrations in Markov Decision Processes
F. Lazzati;A. M. Metelli
2025-01-01
Abstract
Although it is well-known that humans commonly engage in risk-sensitive behaviors in the presence of stochasticity, most Inverse Reinforcement Learning (IRL) models assume a risk-neutral agent. As such, beyond piq introducing model misspecification, piiq they do not permit direct inference of the risk attitude of the observed agent, which can be useful in many applications. In this paper, we propose a novel model of behavior to cope with these issues. By allowing for risk sensitivity, our model alleviates piq, and by explicitly representing risk attitudes through (learnable) utility functions, it solves piiq. Then, we characterize the partial identifiability of an agent’s utility under the new model and note that demonstrations from multiple environments mitigate the problem. We devise two provably-efficient algorithms for learning utilities in a finite-data regime, and we conclude with some proof-of-concept experiments to validate both our model and our algorithms.| File | Dimensione | Formato | |
|---|---|---|---|
|
main.pdf
accesso aperto
Dimensione
1.16 MB
Formato
Adobe PDF
|
1.16 MB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


