We introduce EgoLife, a project to develop an egocentric life assistant that accompanies and enhances personal efficiency through AI-powered wearable glasses. To lay the foundation for this assistant, we conducted a comprehensive data collection study where six participants lived together for one week, continuously recording their daily activitiesincluding discussions, shopping, cooking, social-izing, and entertainmentusing AI glasses for multimodal person-view video references. This effort resulted in EgoLife Dataset, a comprehensive 300-hour egocentric, terpersonal, multiview, and multimodal daily life with intensive annotation. Leveraging this dataset, we troduce EgoLifeQA, a suite of long-context, life-oriented question-answering tasks designed to provide meaningful sistance in daily life by addressing practical questions as recalling past relevant events, monitoring health and offering personalized recommendations.To address the key technical challenges of 1) developing robust visual-audio models for egocentric data, 2) enabling identity recognition, and 3) facilitating long-context question answering over extensive temporal information, we introduce EgoBulter, an integrated system comprising EgoGPT and EgoRAG. EgoGPT is an omni-modal model trained on egocentric datasets, achieving state-of-the-art performance on egocentric video understanding. EgoRAG is a retrieval-based component that supports answering ultra-long-context questions. Our experimental studies verify their working mechanisms and reveal critical factors and bottlenecks, guiding future improvements. By releasing our datasets, models, and benchmarks, we aim to stimulate further research in egocentric AI assistants.

EgoLife: Towards Egocentric Life Assistant

Cominelli, Marco;
2025-01-01

Abstract

We introduce EgoLife, a project to develop an egocentric life assistant that accompanies and enhances personal efficiency through AI-powered wearable glasses. To lay the foundation for this assistant, we conducted a comprehensive data collection study where six participants lived together for one week, continuously recording their daily activitiesincluding discussions, shopping, cooking, social-izing, and entertainmentusing AI glasses for multimodal person-view video references. This effort resulted in EgoLife Dataset, a comprehensive 300-hour egocentric, terpersonal, multiview, and multimodal daily life with intensive annotation. Leveraging this dataset, we troduce EgoLifeQA, a suite of long-context, life-oriented question-answering tasks designed to provide meaningful sistance in daily life by addressing practical questions as recalling past relevant events, monitoring health and offering personalized recommendations.To address the key technical challenges of 1) developing robust visual-audio models for egocentric data, 2) enabling identity recognition, and 3) facilitating long-context question answering over extensive temporal information, we introduce EgoBulter, an integrated system comprising EgoGPT and EgoRAG. EgoGPT is an omni-modal model trained on egocentric datasets, achieving state-of-the-art performance on egocentric video understanding. EgoRAG is a retrieval-based component that supports answering ultra-long-context questions. Our experimental studies verify their working mechanisms and reveal critical factors and bottlenecks, guiding future improvements. By releasing our datasets, models, and benchmarks, we aim to stimulate further research in egocentric AI assistants.
2025
2025 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR),
File in questo prodotto:
File Dimensione Formato  
EgoLife_Towards_Egocentric_Life_Assistant_PV.pdf

Accesso riservato

: Publisher’s version
Dimensione 3.98 MB
Formato Adobe PDF
3.98 MB Adobe PDF   Visualizza/Apri
EgoLife_Towards_Egocentric_Life_Assistant_AAM.pdf

accesso aperto

: Post-Print (DRAFT o Author’s Accepted Manuscript-AAM)
Dimensione 4.11 MB
Formato Adobe PDF
4.11 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11311/1309649
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? 0
social impact