Wearable AI Workshop @ ECCV 2026

About the Workshop

Building the Future of Wearable AI

🏆 Your chance to work with the largest wearables dataset ever released — 5,000+ hours of real-world egocentric video — and compete for $21K in total cash prizes across three grand challenges that define the frontier of Wearable AI.

What You Will Experience

Imagine an AI that watches the world through your eyes, remembers your entire day, and proactively helps before you even ask. That future is being built right now — and this workshop is where the key breakthroughs will be discussed. If you work on egocentric video, multimodal LLMs, conversational AI, or on-device inference, this is your workshop.

Why You Should Participate

Three things make this workshop stand out: (1) it brings together researchers from egocentric vision, long-form video Q&A, and conversational AI in one place; (2) it introduces high-impact benchmark tasks — proactive AI and streaming multi-turn dialog — that push the frontier of Wearable AI; (3) every participant gets a ready-to-use Participant Toolkit and baseline models to hit the ground running. Come ready to collaborate at one of the most vibrant sessions at ECCV.

Research Areas

Topics of Interest

The workshop welcomes submissions on the following topics, among others:

Long-Context & Real-Time Interactions Addressing memory bottlenecks (e.g., KV cache) for sustained dialog about past visual content; enabling low-latency, streaming responsiveness and continual answer revision as new visual frames arrive.

Proactive AI Systems Building agents that proactively anticipate and act on user needs based on evolving visual context autonomously, moving beyond traditional user-initiated requests.

Persistent Scene and Object Memory from Video Learning unified, time-aware representations of the user's environment from egocentric video to support spatial-temporal reasoning, object memory, and retrieval.

Efficient AI/Edge Computing for Wearable Devices Building small models that run on the strict low power requirements of wearable devices, involving model compression, token compression, quantization, etc.

Multi-Modal Foundation Architecture Developing state-of-the-art architectures specialized for egocentric video understanding with grounded conversations, multi-modal reasoning, contextual object recognition, real-time action prediction, and personalized user assistance.

Auxiliary Sensor Signals from Wearables Learning contextualized signals from diverse wearable devices (e.g., EMG, IMU motion sensors), augmented to the primary vision signals for various assistant use cases.

Grand Challenge

Wearable AI Grand Challenge

Wearable AI is one of the most exciting and challenging frontiers in computer vision and AI today. Yet progress has been hampered by a lack of large-scale, realistic benchmarks that reflect the complexity of real-world egocentric experiences — long-duration video, proactive AI interactions, and multi-turn conversations. This grand challenge is designed to close that gap: to catalyze the community around concrete, measurable tasks and to reward reproducible, open-sourced breakthroughs.

In addition to general research paper submissions, our workshop features three specific grand challenge tasks. A total of $21,000 in cash prizes will be awarded to winners with reproducible, open-sourced results. 90% of the data is released as train+dev, with 10% kept hidden for evaluation.

Challenge 1

Proactive AI

Given a long egocentric video and user requests, predict proactive engagements (e.g., step-by-step instructions, object finding) at appropriate moments to maximize utility while minimizing disruption.

3.5K videos Interruption P/R

🎬 Sample Clips

Challenge 2

Multi-turn Conversations

Given a long egocentric video interleaved with user–assistant interactions, predict answers to user questions in a streaming fashion, maintaining coherence with past context.

12K videos Response Accuracy

🎬 Sample Clips

Challenge 3

Long Video Q&A

Given 30+ minute egocentric videos, answer questions posed at the end, via temporal grounding and recall of events spanning the full video duration.

20K videos Response Accuracy

🎬 Sample Clips

💾 Challenge Dataset

To facilitate this challenge, we are releasing the largest wearable AI dataset ever collected — three newly annotated, large-scale collections that are the first to combine long-duration egocentric video with real user–assistant conversations and proactive AI annotations.

Dataset	Egocentric	Long QA	User-AI Conv.	Proactive	# Hours
EpicKitchen	✓	✗	✗	✗	0.1K
EgoSchema	✓	✓	✗	✗	0.3K
EgoExo	✓	✓	✗	✗	1.2K
Ego4D	✓	✓	✗	✗	3.5K
Wearable AI	✓	✓	✓	✓	5K

🔧 Participant Toolkit

Every registered participant will receive our Participant Toolkit: standardized data loaders, evaluation scripts, and a suite of MLLM baseline models covering diverse architectures.

⏳ Coming Soon — Dataset and Toolkit will be released in May 2026. Register below to get notified.

🏁 Participate

Register below to participate in the workshop. Individual registrations are open to all researchers and students. If you are participating in the Grand Challenge with a team, provide your team name in the registration form.

Invited Talks

Confirmed Keynote Speakers

Prof. Kristen Grauman

UT Austin

Professor, Computer Vision Group

Leading researcher in computer vision, renowned for her pivotal role in the development and leadership of the Ego4D and EgoExo projects.

Prof. Dima Damen

University of Bristol, Google DeepMind

Professor of Computer Vision

Distinguished professor in computer vision, renowned for founding and leading the EpicKitchen dataset. Expert in egocentric video analysis and action recognition.

Dr. Raffay Hamid

Meta Reality Labs

Distinguished Engineer, GenAI

Distinguished Engineer at Meta Reality Labs, leading Generative AI and Wearable AI. Frequently invited keynote speaker at ECCV, CVPR, and ICCV. Work featured in BBC, TIME, and MIT Technology Review.

Additional keynote speakers to be announced.

Workshop Committee

Organizers

The organizing committee brings together world-class researchers from Meta Reality Labs, the University of Edinburgh, HKUST, Georgia Tech, and UCF — spanning multimodal AI, computer vision, robotics, and conversational systems.

Dr. Tuyen (Harry) Tran

Meta Reality Labs

Grand Challenge Chair

Research Scientist/Technical Lead at Meta Reality Labs. Work focuses on large vision encoders and multimodal foundation models for real-time perception and conversational AI on wearables.

Dr. Maxim Arap

Meta Reality Labs

Publicity Chair

Tech Lead at Meta Reality Labs, driving AI modeling and agentic systems for multi-modal proactive assistants. His current work focuses on self-improvement methods in multi-agent systems for wearables.

Dr. Seungwhan Moon

Meta Reality Labs

Primary Contact

Lead AI Research Scientist at Meta Reality Labs. Ph.D. in ML and Language Technologies from CMU. Organized workshops at AAAI, KDD, ICASSP, and NeurIPS.

Dr. Raffay Hamid

Meta Reality Labs

General Chair

Distinguished Engineer at Meta Reality Labs, leading Generative AI and Wearable AI. Frequently invited keynote speaker at ECCV, CVPR, and ICCV. Work featured in BBC, TIME, and MIT Technology Review.

Dr. Alessandro Suglia

University of Edinburgh

Publication Chair

Assistant Professor, ELLIS member and GAIL Fellow. Research focuses on multimodal generative AI for robotics. Collaborated with Amazon Alexa AI and European Space Agency.

Prof. Zsolt Kira

Georgia Institute of Technology

Organizer

Founding professor of the Robotics Perception and Learning lab at Georgia Tech. Research includes robust finetuning of VLMs, open-world generalization, and Vision-Language-Action models.

Prof. Pascale Fung

HKUST, AMI Labs

Organizer

Professor at HKUST directing research in conversational AI, NLP, and human-robot interaction. IEEE Fellow. Pioneering work in multilingual speech and language technologies.

Prof. Mubarak Shah

University of Central Florida

Organizer

Founding director of the Center for Research in Computer Vision and Trustee Chair Professor of CS at UCF. Fellow of NAI, IEEE, AAAS, IAPR and SPIE.

Schedule

Workshop Program Tentative

Full-day workshop with 4 invited talks, 2 oral presentations, 2 grand challenge talks, 1 poster session, and a panel discussion. The schedule below is tentative and subject to change.

09:00 – 09:15

Opening Remarks

Workshop overview and introduction

09:15 – 09:55

Invited Talk 1

Keynote Speaker TBA (40 min)

09:55 – 10:35

Invited Talk 2

Keynote Speaker TBA (40 min)

10:35 – 11:00

Coffee Break

11:00 – 11:40

Invited Talk 3

Keynote Speaker TBA (40 min)

11:40 – 12:20

Invited Talk 4

Keynote Speaker TBA (40 min)

12:20 – 13:30

Lunch Break

13:30 – 13:50

Oral Presentation 1

Accepted paper (20 min)

13:50 – 14:10

Oral Presentation 2

Accepted paper (20 min)

14:10 – 14:30

Grand Challenge Talk 1

Proactive AI challenge results (20 min)

14:30 – 14:50

Grand Challenge Talk 2

Multi-turn & LongQA results (20 min)

14:50 – 15:20

Coffee Break

15:20 – 16:20

Poster Session

Accepted papers and challenge submissions (60 min)

16:20 – 17:20

Panel Discussion

Live Q&A with all speakers and organizers (60 min)

17:20 – 17:30

Closing Remarks & Awards

Prize announcements

Wearable AI Workshop

Building the Future of Wearable AI

What You Will Experience

Why You Should Participate

Topics of Interest

Wearable AI Grand Challenge

Proactive AI

Multi-turn Conversations

Long Video Q&A

💾 Challenge Dataset

🔧 Participant Toolkit

🏁 Participate

Confirmed Keynote Speakers

Organizers

Workshop Program Tentative

Important Dates