We explore the transformative potential of SAM 2, a vision foundation model, in advancing gaze estimation. SAM 2 addresses key challenges in gaze estimation by significantly reducing annotation time, simplifying deployment, and enhancing segmentation accuracy. Utilizing its zero-shot capabilities with minimal user input—a single click per video—we tested SAM 2 on over 14 million eye images from a diverse range of datasets, including the EDS challenge datasets and Labelled Pupils in the Wild. This is the first application of SAM 2 to the gaze estimation domain. Remarkably, SAM 2 matches the performance of domain-specific models in pupil segmentation, achieving competitive mIOU scores of up to 93% without fine-tuning. We argue that SAM 2 achieves the sought-after standard of domain generalization, with consistent mIOU scores (89.71%-93.74%) across diverse datasets, from virtual reality to”gaze-in-the-wild” scenarios. We provide our code and segmentation masks for these datasets to promote further research.
Zero-Shot Pupil Segmentation with SAM 2: A Case Study of Over 14 Million Images
Byrne, Sean Anthony;Carminati, Marco;
2025-01-01
Abstract
We explore the transformative potential of SAM 2, a vision foundation model, in advancing gaze estimation. SAM 2 addresses key challenges in gaze estimation by significantly reducing annotation time, simplifying deployment, and enhancing segmentation accuracy. Utilizing its zero-shot capabilities with minimal user input—a single click per video—we tested SAM 2 on over 14 million eye images from a diverse range of datasets, including the EDS challenge datasets and Labelled Pupils in the Wild. This is the first application of SAM 2 to the gaze estimation domain. Remarkably, SAM 2 matches the performance of domain-specific models in pupil segmentation, achieving competitive mIOU scores of up to 93% without fine-tuning. We argue that SAM 2 achieves the sought-after standard of domain generalization, with consistent mIOU scores (89.71%-93.74%) across diverse datasets, from virtual reality to”gaze-in-the-wild” scenarios. We provide our code and segmentation masks for these datasets to promote further research.| File | Dimensione | Formato | |
|---|---|---|---|
|
3729409.pdf
accesso aperto
:
Publisher’s version
Dimensione
2.96 MB
Formato
Adobe PDF
|
2.96 MB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


