RE.PUBLIC@POLIMI pubblicazioni di ricerca del Politecnico di Milano

The stochastic multi-armed bandit problem is a standard model to solve the exploration–exploitation trade-off in sequential decision problems. In clinical trials, which are sensitive to outlier data, the goal is to learn a risk-averse policy to provide a trade-off between exploration, exploitation, and safety. In this paper, we present a risk-averse multi-armed bandit algorithm to solve a decision-making problem based on the social engagement behaviors of children with Autism Spectrum Disorder (ASD). The algorithm is carried out when children interact with a humanoid robot and imitate a sequence of the robot's movements. The proposed algorithm is based on the Best Empirical Sampled Average algorithm under Entropic Value-at-Risk as a risk measure to decide on the best sequence of movements that can improve the social engagement behaviors of the children with ASD while imitating the robot's movements. We provide a detailed experimental analysis to compare the performance of our proposed algorithm to some well-known risk-averse multi-armed bandit algorithms on some artificial scenarios and our real-world problem. The experimental results report that the proposed algorithm outperforms its competitors in terms of robustness, risk avoidance, and cumulative regret, promoting the social engagement behaviors of children with ASD when imitating a robot's movements.

Robust risk-averse multi-armed bandits with application in social engagement behavior of children with autism spectrum disorder while imitating a humanoid robot

Aryania A.;Aghdasi H. S.;Heshmati R.;Bonarini A.

2021-01-01

Abstract

The stochastic multi-armed bandit problem is a standard model to solve the exploration–exploitation trade-off in sequential decision problems. In clinical trials, which are sensitive to outlier data, the goal is to learn a risk-averse policy to provide a trade-off between exploration, exploitation, and safety. In this paper, we present a risk-averse multi-armed bandit algorithm to solve a decision-making problem based on the social engagement behaviors of children with Autism Spectrum Disorder (ASD). The algorithm is carried out when children interact with a humanoid robot and imitate a sequence of the robot's movements. The proposed algorithm is based on the Best Empirical Sampled Average algorithm under Entropic Value-at-Risk as a risk measure to decide on the best sequence of movements that can improve the social engagement behaviors of the children with ASD while imitating the robot's movements. We provide a detailed experimental analysis to compare the performance of our proposed algorithm to some well-known risk-averse multi-armed bandit algorithms on some artificial scenarios and our real-world problem. The experimental results report that the proposed algorithm outperforms its competitors in terms of robustness, risk avoidance, and cumulative regret, promoting the social engagement behaviors of children with ASD when imitating a robot's movements.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno di pubblicazione
	
				2021
			
	Titolo della rivista
	
				INFORMATION SCIENCES
			
	Parole chiave
	
				Autism Spectrum Disorder
Entropic Value-at-Risk
Multi-Armed Bandits
Risk measure
Risk-averse
Social engagement
			
	Appare nelle tipologie:
	
				01.1 Articolo in Rivista

File in questo prodotto:

File	Dimensione	Formato
InformationSciences1-s2.0-S0020025521005399-main.pdf Accesso riservato Descrizione: Articolo principale : Publisher’s version Dimensione 6.99 MB Formato Adobe PDF Visualizza/Apri	6.99 MB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11311/1189861

Citazioni

ND

4

4

social impact