RE.PUBLIC@POLIMI pubblicazioni di ricerca del Politecnico di Milano

Although it is well-known that humans commonly engage in risk-sensitive behaviors in the presence of stochasticity, most Inverse Reinforcement Learning (IRL) models assume a risk-neutral agent. As such, beyond piq introducing model misspecification, piiq they do not permit direct inference of the risk attitude of the observed agent, which can be useful in many applications. In this paper, we propose a novel model of behavior to cope with these issues. By allowing for risk sensitivity, our model alleviates piq, and by explicitly representing risk attitudes through (learnable) utility functions, it solves piiq. Then, we characterize the partial identifiability of an agent’s utility under the new model and note that demonstrations from multiple environments mitigate the problem. We devise two provably-efficient algorithms for learning utilities in a finite-data regime, and we conclude with some proof-of-concept experiments to validate both our model and our algorithms.

Learning Utilities from Demonstrations in Markov Decision Processes

F. Lazzati;A. M. Metelli

2025-01-01

Abstract

Although it is well-known that humans commonly engage in risk-sensitive behaviors in the presence of stochasticity, most Inverse Reinforcement Learning (IRL) models assume a risk-neutral agent. As such, beyond piq introducing model misspecification, piiq they do not permit direct inference of the risk attitude of the observed agent, which can be useful in many applications. In this paper, we propose a novel model of behavior to cope with these issues. By allowing for risk sensitivity, our model alleviates piq, and by explicitly representing risk attitudes through (learnable) utility functions, it solves piiq. Then, we characterize the partial identifiability of an agent’s utility under the new model and note that demonstrations from multiple environments mitigate the problem. We devise two provably-efficient algorithms for learning utilities in a finite-data regime, and we conclude with some proof-of-concept experiments to validate both our model and our algorithms.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno di pubblicazione
	
				2025
			
	Titolo del libro
	
				Proceedings of the 42nd International Conference on Machine Learning
			
	Titolo della collana
	
				PROCEEDINGS OF MACHINE LEARNING RESEARCH
			
	Appare nelle tipologie:
	
				04.1 Contributo in Atti di convegno

File in questo prodotto:

File	Dimensione	Formato
main.pdf accesso aperto Dimensione 1.16 MB Formato Adobe PDF Visualizza/Apri	1.16 MB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11311/1291868

Citazioni

ND

0

ND

social impact