In industrial settings, weakly supervised (WS) methods areusually preferred over their fully supervised (FS) counterparts as theydo not require costly manual annotations.Unfortunately, the segmentation masks obtained in the WS regime are typically poor in termsof accuracy. In this work, we present a WS method capable of producing accurate masks for semantic segmentation in case of video streams.More specifically, we build saliency maps that exploit the temporalcoherence between consecutive frames in a video, promoting consistencywhen objects appear in different frames. We apply our method in awaste-sorting scenario, where we perform weakly supervised video segmentation (WSVS) by training an auxiliary classifier that distinguishesbetween videos recorded before and after a human operator, who manually removes specific wastes from a conveyor belt. The saliency mapsof this classifier identify materials to be removed, and we modify theclassifier training to minimize differences between the saliency map of acentral frame and those in adjacent frames, after having compensatedobject displacement. Experiments on a real-world dataset demonstratethe benefits of integrating temporal coherence directly during the training phase of the classifier. Code and dataset are available upon request.

Temporal-consistent CAMs for Weakly Supervised Video Segmentation in Waste Sorting

A. Marelli;L. Magri;F. Arrigoni;G. Boracchi
2025-01-01

Abstract

In industrial settings, weakly supervised (WS) methods areusually preferred over their fully supervised (FS) counterparts as theydo not require costly manual annotations.Unfortunately, the segmentation masks obtained in the WS regime are typically poor in termsof accuracy. In this work, we present a WS method capable of producing accurate masks for semantic segmentation in case of video streams.More specifically, we build saliency maps that exploit the temporalcoherence between consecutive frames in a video, promoting consistencywhen objects appear in different frames. We apply our method in awaste-sorting scenario, where we perform weakly supervised video segmentation (WSVS) by training an auxiliary classifier that distinguishesbetween videos recorded before and after a human operator, who manually removes specific wastes from a conveyor belt. The saliency mapsof this classifier identify materials to be removed, and we modify theclassifier training to minimize differences between the saliency map of acentral frame and those in adjacent frames, after having compensatedobject displacement. Experiments on a real-world dataset demonstratethe benefits of integrating temporal coherence directly during the training phase of the classifier. Code and dataset are available upon request.
2025
Computer Vision – ECCV 2024 Workshops
978-3-031-92804-8
Waste sorting, Weakly supervised video segmentation, Class activation maps
File in questo prodotto:
File Dimensione Formato  
2024_05_TemporalCAM_ECCV_WRKSP.pdf

Accesso riservato

Descrizione: Temporal-consistent CAMs for Weakly Supervised Video Segmentation in Waste Sorting
: Pre-Print (o Pre-Refereeing)
Dimensione 10.09 MB
Formato Adobe PDF
10.09 MB Adobe PDF   Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11311/1288885
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact