RE.PUBLIC@POLIMI pubblicazioni di ricerca del Politecnico di Milano

The design of effective bandit algorithms to learn the optimal price is a task of extraordinary importance in all the settings in which the demand curve is not a priori known and the estimation process takes a long time, as customary, e.g., in e-commerce scenarios. In particular, the adoption of effective pricing algorithms may allow companies to increase their profits dramatically. In this paper, we exploit the structure of the pricing problem in online scenarios to improve the performance of state-of-the-art general-purpose bandit algorithms. More specifically, we make use of the monotonicity of the customer demand curve, which suggests the same behavior of the conversion rates, and we exploit the fact that, in many scenarios, companies have a priori information about the order of magnitude of the conversion rate. We design techniques—applicable in principle to any bandit algorithm—capable of exploiting these two properties, and we apply them to Upper Confidence Bound policies both in stationary and nonstationary environments. We show that algorithms exploiting these two properties may significantly outperform state-of-the-art bandit policies in most of the configurations and we also show that the improvement increases as the number of arms increases. In particular, simulations based on real-world data show that our algorithms may increase the profit by 300% or more when compared to the performance achieved by state-of-the-art bandit algorithms. Furthermore, we formally prove that the empirical improvement provided by our algorithms can be achieved without incurring any cost in terms of theoretical guarantees. Indeed, our algorithms present the same asymptotic worst-case regret bounds of the bandit algorithms previously known in the state of the art.

Improving multi-armed bandit algorithms in online pricing settings

Trovò, Francesco;Paladino, Stefano;Restelli, Marcello;Gatti, Nicola

2018-01-01

Abstract

The design of effective bandit algorithms to learn the optimal price is a task of extraordinary importance in all the settings in which the demand curve is not a priori known and the estimation process takes a long time, as customary, e.g., in e-commerce scenarios. In particular, the adoption of effective pricing algorithms may allow companies to increase their profits dramatically. In this paper, we exploit the structure of the pricing problem in online scenarios to improve the performance of state-of-the-art general-purpose bandit algorithms. More specifically, we make use of the monotonicity of the customer demand curve, which suggests the same behavior of the conversion rates, and we exploit the fact that, in many scenarios, companies have a priori information about the order of magnitude of the conversion rate. We design techniques—applicable in principle to any bandit algorithm—capable of exploiting these two properties, and we apply them to Upper Confidence Bound policies both in stationary and nonstationary environments. We show that algorithms exploiting these two properties may significantly outperform state-of-the-art bandit policies in most of the configurations and we also show that the improvement increases as the number of arms increases. In particular, simulations based on real-world data show that our algorithms may increase the profit by 300% or more when compared to the performance achieved by state-of-the-art bandit algorithms. Furthermore, we formally prove that the empirical improvement provided by our algorithms can be achieved without incurring any cost in terms of theoretical guarantees. Indeed, our algorithms present the same asymptotic worst-case regret bounds of the bandit algorithms previously known in the state of the art.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno di pubblicazione
	
				2018
			
	Titolo della rivista
	
				INTERNATIONAL JOURNAL OF APPROXIMATE REASONING
			
	Parole chiave
	
				Multi-Armed Bandit; Nonstationary MAB; Pricing; Software; Theoretical Computer Science; Artificial Intelligence; Applied Mathematics
			
	Appare nelle tipologie:
	
				01.1 Articolo in Rivista

File in questo prodotto:

File	Dimensione	Formato
allegatoF2_trovo2018improving.pdf Accesso riservato Descrizione: Articolo principale : Publisher’s version Dimensione 1.5 MB Formato Adobe PDF Visualizza/Apri	1.5 MB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11311/1074901

Citazioni

ND

15

6

social impact