The stochastic multi-armed bandit problem is a standard model to solve the exploration–exploitation trade-off in sequential decision problems. In clinical trials, which are sensitive to outlier data, the goal is to learn a risk-averse policy to provide a trade-off between exploration, exploitation, and safety. In this paper, we present a risk-averse multi-armed bandit algorithm to solve a decision-making problem based on the social engagement behaviors of children with Autism Spectrum Disorder (ASD). The algorithm is carried out when children interact with a humanoid robot and imitate a sequence of the robot's movements. The proposed algorithm is based on the Best Empirical Sampled Average algorithm under Entropic Value-at-Risk as a risk measure to decide on the best sequence of movements that can improve the social engagement behaviors of the children with ASD while imitating the robot's movements. We provide a detailed experimental analysis to compare the performance of our proposed algorithm to some well-known risk-averse multi-armed bandit algorithms on some artificial scenarios and our real-world problem. The experimental results report that the proposed algorithm outperforms its competitors in terms of robustness, risk avoidance, and cumulative regret, promoting the social engagement behaviors of children with ASD when imitating a robot's movements.

Robust risk-averse multi-armed bandits with application in social engagement behavior of children with autism spectrum disorder while imitating a humanoid robot

Bonarini A.
2021-01-01

Abstract

The stochastic multi-armed bandit problem is a standard model to solve the exploration–exploitation trade-off in sequential decision problems. In clinical trials, which are sensitive to outlier data, the goal is to learn a risk-averse policy to provide a trade-off between exploration, exploitation, and safety. In this paper, we present a risk-averse multi-armed bandit algorithm to solve a decision-making problem based on the social engagement behaviors of children with Autism Spectrum Disorder (ASD). The algorithm is carried out when children interact with a humanoid robot and imitate a sequence of the robot's movements. The proposed algorithm is based on the Best Empirical Sampled Average algorithm under Entropic Value-at-Risk as a risk measure to decide on the best sequence of movements that can improve the social engagement behaviors of the children with ASD while imitating the robot's movements. We provide a detailed experimental analysis to compare the performance of our proposed algorithm to some well-known risk-averse multi-armed bandit algorithms on some artificial scenarios and our real-world problem. The experimental results report that the proposed algorithm outperforms its competitors in terms of robustness, risk avoidance, and cumulative regret, promoting the social engagement behaviors of children with ASD when imitating a robot's movements.
2021
Autism Spectrum Disorder
Entropic Value-at-Risk
Multi-Armed Bandits
Risk measure
Risk-averse
Social engagement
File in questo prodotto:
File Dimensione Formato  
InformationSciences1-s2.0-S0020025521005399-main.pdf

Accesso riservato

Descrizione: Articolo principale
: Publisher’s version
Dimensione 6.99 MB
Formato Adobe PDF
6.99 MB Adobe PDF   Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11311/1189861
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 2
  • ???jsp.display-item.citation.isi??? 1
social impact