RE.PUBLIC@POLIMI pubblicazioni di ricerca del Politecnico di Milano

In industrial settings, weakly supervised (WS) methods areusually preferred over their fully supervised (FS) counterparts as theydo not require costly manual annotations.Unfortunately, the segmentation masks obtained in the WS regime are typically poor in termsof accuracy. In this work, we present a WS method capable of producing accurate masks for semantic segmentation in case of video streams.More specifically, we build saliency maps that exploit the temporalcoherence between consecutive frames in a video, promoting consistencywhen objects appear in different frames. We apply our method in awaste-sorting scenario, where we perform weakly supervised video segmentation (WSVS) by training an auxiliary classifier that distinguishesbetween videos recorded before and after a human operator, who manually removes specific wastes from a conveyor belt. The saliency mapsof this classifier identify materials to be removed, and we modify theclassifier training to minimize differences between the saliency map of acentral frame and those in adjacent frames, after having compensatedobject displacement. Experiments on a real-world dataset demonstratethe benefits of integrating temporal coherence directly during the training phase of the classifier. Code and dataset are available upon request.

Temporal-consistent CAMs for Weakly Supervised Video Segmentation in Waste Sorting

A. Marelli;L. Magri;F. Arrigoni;G. Boracchi

2025-01-01

Abstract

In industrial settings, weakly supervised (WS) methods areusually preferred over their fully supervised (FS) counterparts as theydo not require costly manual annotations.Unfortunately, the segmentation masks obtained in the WS regime are typically poor in termsof accuracy. In this work, we present a WS method capable of producing accurate masks for semantic segmentation in case of video streams.More specifically, we build saliency maps that exploit the temporalcoherence between consecutive frames in a video, promoting consistencywhen objects appear in different frames. We apply our method in awaste-sorting scenario, where we perform weakly supervised video segmentation (WSVS) by training an auxiliary classifier that distinguishesbetween videos recorded before and after a human operator, who manually removes specific wastes from a conveyor belt. The saliency mapsof this classifier identify materials to be removed, and we modify theclassifier training to minimize differences between the saliency map of acentral frame and those in adjacent frames, after having compensatedobject displacement. Experiments on a real-world dataset demonstratethe benefits of integrating temporal coherence directly during the training phase of the classifier. Code and dataset are available upon request.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno di pubblicazione
	
				2025
			
	Titolo del libro
	
				Computer Vision – ECCV 2024 Workshops
			
	ISBN (International Standard Book Number)
	
				978-3-031-92804-8
			
	Parole chiave
	
				Waste sorting, Weakly supervised video segmentation, Class activation maps
			
	Appare nelle tipologie:
	
				04.1 Contributo in Atti di convegno

File in questo prodotto:

File	Dimensione	Formato
2024_05_TemporalCAM_ECCV_WRKSP.pdf Accesso riservato Descrizione: Temporal-consistent CAMs for Weakly Supervised Video Segmentation in Waste Sorting : Pre-Print (o Pre-Refereeing) Dimensione 10.09 MB Formato Adobe PDF Visualizza/Apri	10.09 MB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11311/1288885

Citazioni

ND

ND

ND

social impact