RE.PUBLIC@POLIMI pubblicazioni di ricerca del Politecnico di Milano

The success of sequential decision-making ap-proaches, such as reinforcemeu learning (RL), is closely tied to the availability of a reward feed-back. However, designing a reward function that encodes the desired objective is a challenging task. In this work, we address a more realistic scenario: sequential decision making with prefer-ence feedback provided, for instance, by a human expert. We aim to build a theoretical basis link-ing preferences, (non-Markovian) utilities, and (Markovian) rewards, and we study the connec tions between them. First, we model preference feedback using a partial (pre)order over trajecto-ries, enabling the presence of incomparabilities that are common when preferences are provided by humans but are surprisingly overlooked in ex-isting works. Second, to provide a theoretical justification for a common practice, we investi-gate how a preference relation can be approxi-mated by a multi-objective utility. We introduce a notion of preference-utility compatibility and ana-lyze the computational complexity of this transfor-mation, showing that constructing the minimum-dimensional utility is NP-hard. Third, we propose a novel concept of preference-based policy domi-nance that does not rely on utilities or rewards and discuss the computational complexity of assessing it. Fourth, we develop a computationally efficient algorithm to approximate a utility using (Marko-vian) rewards and quantify the error in terms of the suboptimality of the optimal policy induced by the approximating reward. This work aims to lay the foundation for a principled approach to sequential decision making from preference feed-back, with promising potential applications in RI. from human feedback.

Towards Theoretical Understanding of Sequential Decision Making with Preference Feedback

Simone Drago;Marco Mussi;Alberto Maria Metelli

2025-01-01

Abstract

The success of sequential decision-making ap-proaches, such as reinforcemeu learning (RL), is closely tied to the availability of a reward feed-back. However, designing a reward function that encodes the desired objective is a challenging task. In this work, we address a more realistic scenario: sequential decision making with prefer-ence feedback provided, for instance, by a human expert. We aim to build a theoretical basis link-ing preferences, (non-Markovian) utilities, and (Markovian) rewards, and we study the connec tions between them. First, we model preference feedback using a partial (pre)order over trajecto-ries, enabling the presence of incomparabilities that are common when preferences are provided by humans but are surprisingly overlooked in ex-isting works. Second, to provide a theoretical justification for a common practice, we investi-gate how a preference relation can be approxi-mated by a multi-objective utility. We introduce a notion of preference-utility compatibility and ana-lyze the computational complexity of this transfor-mation, showing that constructing the minimum-dimensional utility is NP-hard. Third, we propose a novel concept of preference-based policy domi-nance that does not rely on utilities or rewards and discuss the computational complexity of assessing it. Fourth, we develop a computationally efficient algorithm to approximate a utility using (Marko-vian) rewards and quantify the error in terms of the suboptimality of the optimal policy induced by the approximating reward. This work aims to lay the foundation for a principled approach to sequential decision making from preference feed-back, with promising potential applications in RI. from human feedback.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno di pubblicazione
	
				2025
			
	Titolo del libro
	
				42nd International Conference on Machine Learning, ICML 2025
			
	Titolo della collana
	
				PROCEEDINGS OF MACHINE LEARNING RESEARCH
			
	Appare nelle tipologie:
	
				04.1 Contributo in Atti di convegno

File in questo prodotto:

File	Dimensione	Formato
_ICML_2025___Camera_Ready__Preference_based_Framework (1).pdf accesso aperto Dimensione 439.14 kB Formato Adobe PDF Visualizza/Apri	439.14 kB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11311/1292596

Citazioni

ND

1

0

social impact