The stochastic multi-armed bandit problem is a standard model to solve the exploration–exploitation trade-off in sequential decision problems. In clinical trials, which are sensitive to outlier data, the goal is to learn a risk-averse policy to provide a trade-off between exploration, exploitation, and safety. In this paper, we present a risk-averse multi-armed bandit algorithm to solve a decision-making problem based on the social engagement behaviors of children with Autism Spectrum Disorder (ASD). The algorithm is carried out when children interact with a humanoid robot and imitate a sequence of the robot's movements. The proposed algorithm is based on the Best Empirical Sampled Average algorithm under Entropic Value-at-Risk as a risk measure to decide on the best sequence of movements that can improve the social engagement behaviors of the children with ASD while imitating the robot's movements. We provide a detailed experimental analysis to compare the performance of our proposed algorithm to some well-known risk-averse multi-armed bandit algorithms on some artificial scenarios and our real-world problem. The experimental results report that the proposed algorithm outperforms its competitors in terms of robustness, risk avoidance, and cumulative regret, promoting the social engagement behaviors of children with ASD when imitating a robot's movements.
Robust risk-averse multi-armed bandits with application in social engagement behavior of children with autism spectrum disorder while imitating a humanoid robot
Bonarini A.
2021-01-01
Abstract
The stochastic multi-armed bandit problem is a standard model to solve the exploration–exploitation trade-off in sequential decision problems. In clinical trials, which are sensitive to outlier data, the goal is to learn a risk-averse policy to provide a trade-off between exploration, exploitation, and safety. In this paper, we present a risk-averse multi-armed bandit algorithm to solve a decision-making problem based on the social engagement behaviors of children with Autism Spectrum Disorder (ASD). The algorithm is carried out when children interact with a humanoid robot and imitate a sequence of the robot's movements. The proposed algorithm is based on the Best Empirical Sampled Average algorithm under Entropic Value-at-Risk as a risk measure to decide on the best sequence of movements that can improve the social engagement behaviors of the children with ASD while imitating the robot's movements. We provide a detailed experimental analysis to compare the performance of our proposed algorithm to some well-known risk-averse multi-armed bandit algorithms on some artificial scenarios and our real-world problem. The experimental results report that the proposed algorithm outperforms its competitors in terms of robustness, risk avoidance, and cumulative regret, promoting the social engagement behaviors of children with ASD when imitating a robot's movements.File | Dimensione | Formato | |
---|---|---|---|
InformationSciences1-s2.0-S0020025521005399-main.pdf
Accesso riservato
Descrizione: Articolo principale
:
Publisher’s version
Dimensione
6.99 MB
Formato
Adobe PDF
|
6.99 MB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.