We study a novel sequential decision-making setting, namely the dissimilarity bandits. At each round, the learner pulls an arm that provides a stochastic d-dimensional observation vector. The learner aims to identify the pair of arms with the maximum dissimilarity, where such an index is computed over pairs of expected observation vectors. We propose Successive Elimination for Dissimilarity (SED), a fixed-confidence best-pair identification algorithm based on sequential elimination. SED discards individual arms when there is statistical evidence that they cannot belong to a pair of most dissimilar arms and, thus, effectively exploits the structure of the setting by reusing the estimates of the expected observation vectors. We provide results on the sample complexity of SED, depending on {HP}, a novel index characterizing the complexity of identifying the pair of the most dissimilar arms. Then, we provide a sample complexity lower bound, highlighting the challenges of the identification problem for dissimilarity bandits, which is almost matched by our SED. Finally, we compare our approach over synthetically generated data and a realistic environmental monitoring domain against classical and combinatorial best-arm identification algorithms for the cases d=1 and d>1.

Dissimilarity Bandits

Paolo Battellani;Alberto Maria Metelli;Francesco Trovo
2024-01-01

Abstract

We study a novel sequential decision-making setting, namely the dissimilarity bandits. At each round, the learner pulls an arm that provides a stochastic d-dimensional observation vector. The learner aims to identify the pair of arms with the maximum dissimilarity, where such an index is computed over pairs of expected observation vectors. We propose Successive Elimination for Dissimilarity (SED), a fixed-confidence best-pair identification algorithm based on sequential elimination. SED discards individual arms when there is statistical evidence that they cannot belong to a pair of most dissimilar arms and, thus, effectively exploits the structure of the setting by reusing the estimates of the expected observation vectors. We provide results on the sample complexity of SED, depending on {HP}, a novel index characterizing the complexity of identifying the pair of the most dissimilar arms. Then, we provide a sample complexity lower bound, highlighting the challenges of the identification problem for dissimilarity bandits, which is almost matched by our SED. Finally, we compare our approach over synthetically generated data and a realistic environmental monitoring domain against classical and combinatorial best-arm identification algorithms for the cases d=1 and d>1.
2024
27th International Conference on Artificial Intelligence and Statistics, AISTATS 2024
File in questo prodotto:
File Dimensione Formato  
battellani24a.pdf

accesso aperto

: Publisher’s version
Dimensione 505.66 kB
Formato Adobe PDF
505.66 kB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11311/1267545
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? ND
social impact