RE.PUBLIC@POLIMI pubblicazioni di ricerca del Politecnico di Milano

This paper presents a learning-based method that uses simulation data to learn an object manipulation task using two model-free reinforcement learning (RL) algorithms. The learning performance is compared across on-policy and off-policy algorithms: Proximal Policy Optimization (PPO) and Soft Actor-Critic (SAC). In order to accelerate the learning process, the fine-tuning procedure is proposed that demonstrates the continuous adaptation of on-policy RL to new environments, allowing the learned policy to adapt and execute the (partially) modified task. A dense reward function is designed for the task to enable an efficient learning of the agent. A grasping task involving a Franka Emika Panda manipulator is considered as the reference task to be learned. The learned control policy is demonstrated to be generalizable across multiple object geometries and initial robot/parts configurations. The approach is finally tested on a real Franka Emika Panda robot, showing the possibility to transfer the learned behavior from simulation. Experimental results show 100% of successful grasping tasks, making the proposed approach applicable to real applications.

Continuous control actions learning and adaptation for robotic manipulation through reinforcement learning

Shahid A. A.;Piga D.;Braghin F.;Roveda L.

2022-01-01

Abstract

This paper presents a learning-based method that uses simulation data to learn an object manipulation task using two model-free reinforcement learning (RL) algorithms. The learning performance is compared across on-policy and off-policy algorithms: Proximal Policy Optimization (PPO) and Soft Actor-Critic (SAC). In order to accelerate the learning process, the fine-tuning procedure is proposed that demonstrates the continuous adaptation of on-policy RL to new environments, allowing the learned policy to adapt and execute the (partially) modified task. A dense reward function is designed for the task to enable an efficient learning of the agent. A grasping task involving a Franka Emika Panda manipulator is considered as the reference task to be learned. The learned control policy is demonstrated to be generalizable across multiple object geometries and initial robot/parts configurations. The approach is finally tested on a real Franka Emika Panda robot, showing the possibility to transfer the learned behavior from simulation. Experimental results show 100% of successful grasping tasks, making the proposed approach applicable to real applications.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno di pubblicazione
	
				2022
			
	Titolo della rivista
	
				AUTONOMOUS ROBOTS
			
	Parole chiave
	
				Continuous control, Policy optimization, Reinforcement learning, Robotic grasping
			
	Appare nelle tipologie:
	
				01.1 Articolo in Rivista

File in questo prodotto:

File	Dimensione	Formato
s10514-022-10034-z.pdf accesso aperto : Publisher’s version Dimensione 2.05 MB Formato Adobe PDF Visualizza/Apri	2.05 MB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11311/1233378

Citazioni

ND

40

17

social impact