Background and Objective: Safety of robotic surgery can be enhanced through augmented vision or artificial constraints to the robotl motion, and intra-operative depth estimation is the cornerstone of these applications because it provides precise position information of surgical scenes in 3D space. High-quality depth estimation of endoscopic scenes has been a valuable issue, and the development of deep learning provides more possibility and potential to address this issue. Methods: In this paper, a deep learning-based approach is proposed to recover 3D information of intra-operative scenes. To this aim, a fully 3D encoder-decoder network integrating spatio-temporal layers is designed, and it adopts hierarchical prediction and progressive learning to enhance prediction accuracy and shorten training time. Results: Our network gets the depth estimation accuracy of MAE 2.55±1.51 (mm) and RMSE 5.23±1.40 (mm) using 8 surgical videos with a resolution of 1280×1024, which performs better compared with six other state-of-the-art methods that were trained on the same data. Conclusions: Our network can implement a promising depth estimation performance in intra-operative scenes using stereo images, allowing the integration in robot-assisted surgery to enhance safety.

Spatio-temporal layers based intra-operative stereo depth estimation network via hierarchical prediction and progressive training

Ziyang Chen;Laura Cruciani;Giancarlo Ferrigno;Elena De Momi
2024-01-01

Abstract

Background and Objective: Safety of robotic surgery can be enhanced through augmented vision or artificial constraints to the robotl motion, and intra-operative depth estimation is the cornerstone of these applications because it provides precise position information of surgical scenes in 3D space. High-quality depth estimation of endoscopic scenes has been a valuable issue, and the development of deep learning provides more possibility and potential to address this issue. Methods: In this paper, a deep learning-based approach is proposed to recover 3D information of intra-operative scenes. To this aim, a fully 3D encoder-decoder network integrating spatio-temporal layers is designed, and it adopts hierarchical prediction and progressive learning to enhance prediction accuracy and shorten training time. Results: Our network gets the depth estimation accuracy of MAE 2.55±1.51 (mm) and RMSE 5.23±1.40 (mm) using 8 surgical videos with a resolution of 1280×1024, which performs better compared with six other state-of-the-art methods that were trained on the same data. Conclusions: Our network can implement a promising depth estimation performance in intra-operative scenes using stereo images, allowing the integration in robot-assisted surgery to enhance safety.
2024
File in questo prodotto:
File Dimensione Formato  
1-s2.0-S016926072300603X-main.pdf

accesso aperto

Dimensione 3.5 MB
Formato Adobe PDF
3.5 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11311/1259858
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 3
  • ???jsp.display-item.citation.isi??? 3
social impact