RE.PUBLIC@POLIMI pubblicazioni di ricerca del Politecnico di Milano

Surgical-tool joint detection from laparoscopic images is an important but challenging task in computer-assisted minimally invasive surgery. Illumination levels, variations in background and the different number of tools in the field of view, all pose difficulties to algorithm and model training. Yet, such challenges could be potentially tackled by exploiting the temporal information in laparoscopic videos to avoid per frame handling of the problem. In this letter, we propose a novel encoder-decoder architecture for surgical instrument joint detection and localization that uses three-dimensional convolutional layers to exploit spatio-temporal features from laparoscopic videos. When tested on benchmark and custom-built datasets, a median Dice similarity coefficient of 85.1% with an interquartile range of 4.6% highlights performance better than the state of the art based on single-frame processing. Alongside novelty of the network architecture, the idea for inclusion of temporal information appears to be particularly useful when processing images with unseen backgrounds during the training phase, which indicates that spatio-temporal features for joint detection help to generalize the solution.

Deep Learning Based Robotic Tool Detection and Articulation Estimation with Spatio-Temporal Layers

COLLEONI, EMANUELE;Moccia S.;DU, XIAN;De Momi E.;Stoyanov D.

2019-01-01

Abstract

Surgical-tool joint detection from laparoscopic images is an important but challenging task in computer-assisted minimally invasive surgery. Illumination levels, variations in background and the different number of tools in the field of view, all pose difficulties to algorithm and model training. Yet, such challenges could be potentially tackled by exploiting the temporal information in laparoscopic videos to avoid per frame handling of the problem. In this letter, we propose a novel encoder-decoder architecture for surgical instrument joint detection and localization that uses three-dimensional convolutional layers to exploit spatio-temporal features from laparoscopic videos. When tested on benchmark and custom-built datasets, a median Dice similarity coefficient of 85.1% with an interquartile range of 4.6% highlights performance better than the state of the art based on single-frame processing. Alongside novelty of the network architecture, the idea for inclusion of temporal information appears to be particularly useful when processing images with unseen backgrounds during the training phase, which indicates that spatio-temporal features for joint detection help to generalize the solution.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno di pubblicazione
	
				2019
			
	Titolo della rivista
	
				IEEE ROBOTICS AND AUTOMATION LETTERS
			
	Parole chiave
	
				computer assisted interventions; medical robotics; minimally invasive surgery; surgical vision; Surgical-tool detection
			
	Appare nelle tipologie:
	
				01.1 Articolo in Rivista

File in questo prodotto:

File	Dimensione	Formato
Deep Learning Based Robotic Tool Detection.pdf Accesso riservato : Publisher’s version Dimensione 4.06 MB Formato Adobe PDF Visualizza/Apri	4.06 MB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11311/1099416

Citazioni

ND

85

58

social impact