In industrial settings, weakly supervised (WS) methods areusually preferred over their fully supervised (FS) counterparts as theydo not require costly manual annotations.Unfortunately, the segmentation masks obtained in the WS regime are typically poor in termsof accuracy. In this work, we present a WS method capable of producing accurate masks for semantic segmentation in case of video streams.More specifically, we build saliency maps that exploit the temporalcoherence between consecutive frames in a video, promoting consistencywhen objects appear in different frames. We apply our method in awaste-sorting scenario, where we perform weakly supervised video segmentation (WSVS) by training an auxiliary classifier that distinguishesbetween videos recorded before and after a human operator, who manually removes specific wastes from a conveyor belt. The saliency mapsof this classifier identify materials to be removed, and we modify theclassifier training to minimize differences between the saliency map of acentral frame and those in adjacent frames, after having compensatedobject displacement. Experiments on a real-world dataset demonstratethe benefits of integrating temporal coherence directly during the training phase of the classifier. Code and dataset are available upon request.
Temporal-consistent CAMs for Weakly Supervised Video Segmentation in Waste Sorting
A. Marelli;L. Magri;F. Arrigoni;G. Boracchi
2025-01-01
Abstract
In industrial settings, weakly supervised (WS) methods areusually preferred over their fully supervised (FS) counterparts as theydo not require costly manual annotations.Unfortunately, the segmentation masks obtained in the WS regime are typically poor in termsof accuracy. In this work, we present a WS method capable of producing accurate masks for semantic segmentation in case of video streams.More specifically, we build saliency maps that exploit the temporalcoherence between consecutive frames in a video, promoting consistencywhen objects appear in different frames. We apply our method in awaste-sorting scenario, where we perform weakly supervised video segmentation (WSVS) by training an auxiliary classifier that distinguishesbetween videos recorded before and after a human operator, who manually removes specific wastes from a conveyor belt. The saliency mapsof this classifier identify materials to be removed, and we modify theclassifier training to minimize differences between the saliency map of acentral frame and those in adjacent frames, after having compensatedobject displacement. Experiments on a real-world dataset demonstratethe benefits of integrating temporal coherence directly during the training phase of the classifier. Code and dataset are available upon request.| File | Dimensione | Formato | |
|---|---|---|---|
|
2024_05_TemporalCAM_ECCV_WRKSP.pdf
Accesso riservato
Descrizione: Temporal-consistent CAMs for Weakly Supervised Video Segmentation in Waste Sorting
:
Pre-Print (o Pre-Refereeing)
Dimensione
10.09 MB
Formato
Adobe PDF
|
10.09 MB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


