6 DoF Pose Regression via Differentiable Rendering

Simpsi, Andrea; Roggerini, Marco; Cannici, Marco; Matteucci, Matteo

doi:10.1007/978-3-031-06430-2_54

Six Degrees of Freedom (6DoF) pose estimation is a crucial task in computer vision. It consists in identifying the 3D translation and rotation of an object with respect to the observer system of coordinates. When this is obtained from a single image, we name it Monocular 6 DoF Regression and it is a very prominent task in several fields such as robot manipulation, autonomous driving, scene reconstruction, augmented reality as well as aerospace. The prevailing methods used to tackle this task, according to the literature, are the direct regression of the object's pose from the input image and the regression of the objects' keypoints followed by a Perspective-n-Point algorithm to obtain its pose. While the former requires a lot of data to train the deep neural networks used to accomplish the task, the latter requires costly annotations of keypoints for the objects to be regressed. In this work, we propose a new method to address 6DoF pose estimation using differentiable rendering along the entire pipeline. First, we reconstruct the 3D model of an object with a differentiable rendering technique. Then, we use this information to enrich our dataset with new images and useful annotations and regress a first estimation of the 6DoF pose. Finally, we refine this coarse pose with a render-and-compare approach using differentiable rendering. We tested our method on ESA's Pose Estimation Challenge using the SPEED dataset. Our approach achieves competitive results on the benchmark challenge, and the render-and-compare step is shown to be able to enhance the performance of existing state-of-the-art algorithms.