RE.PUBLIC@POLIMI pubblicazioni di ricerca del Politecnico di Milano

We explore the transformative potential of SAM 2, a vision foundation model, in advancing gaze estimation. SAM 2 addresses key challenges in gaze estimation by significantly reducing annotation time, simplifying deployment, and enhancing segmentation accuracy. Utilizing its zero-shot capabilities with minimal user input—a single click per video—we tested SAM 2 on over 14 million eye images from a diverse range of datasets, including the EDS challenge datasets and Labelled Pupils in the Wild. This is the first application of SAM 2 to the gaze estimation domain. Remarkably, SAM 2 matches the performance of domain-specific models in pupil segmentation, achieving competitive mIOU scores of up to 93% without fine-tuning. We argue that SAM 2 achieves the sought-after standard of domain generalization, with consistent mIOU scores (89.71%-93.74%) across diverse datasets, from virtual reality to”gaze-in-the-wild” scenarios. We provide our code and segmentation masks for these datasets to promote further research.

Zero-Shot Pupil Segmentation with SAM 2: A Case Study of Over 14 Million Images

Maquiling, Virmarie;Byrne, Sean Anthony;Niehorster, Diederick C.;Carminati, Marco;Kasneci, Enkelejda

2025-01-01

Abstract

We explore the transformative potential of SAM 2, a vision foundation model, in advancing gaze estimation. SAM 2 addresses key challenges in gaze estimation by significantly reducing annotation time, simplifying deployment, and enhancing segmentation accuracy. Utilizing its zero-shot capabilities with minimal user input—a single click per video—we tested SAM 2 on over 14 million eye images from a diverse range of datasets, including the EDS challenge datasets and Labelled Pupils in the Wild. This is the first application of SAM 2 to the gaze estimation domain. Remarkably, SAM 2 matches the performance of domain-specific models in pupil segmentation, achieving competitive mIOU scores of up to 93% without fine-tuning. We argue that SAM 2 achieves the sought-after standard of domain generalization, with consistent mIOU scores (89.71%-93.74%) across diverse datasets, from virtual reality to”gaze-in-the-wild” scenarios. We provide our code and segmentation masks for these datasets to promote further research.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno di pubblicazione
	
				2025
			
	Titolo del libro
	
				Proceedings of the ACM on Computer Graphics and Interactive Techniques
			
	Parole chiave
	
				Eye Tracking
Foundation Models
Gaze Estimation
Methods
			
	Appare nelle tipologie:
	
				04.1 Contributo in Atti di convegno

File in questo prodotto:

File	Dimensione	Formato
3729409.pdf accesso aperto : Publisher’s version Dimensione 2.96 MB Formato Adobe PDF Visualizza/Apri	2.96 MB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11311/1309551

Citazioni

ND

3

3

social impact