Journal article
ActiveEye: enabling continuous and responsive video understanding for smart eyewear systems
- Abstract:
- Integrating vision-language models (VLMs) with wearable devices offers great potential for continuous and responsive video understanding, a key capability for applications such as smart eyewear-based conversational assistants. However, achieving this on resource-constrained devices is challenging due to the high energy demands of continuous spatial-temporal sampling and transmission. We propose ActiveEye , a VLM designed for energy-efficient and responsive video understanding. ActiveEye separates visual and motion semantic representations and incorporates an active perception-based feedback path to adaptively adjust spatial-temporal sampling and transmission rates. Implemented as a wearable-mobile-cloud system, ActiveEye is evaluated for energy efficiency, real-time semantic change detection, and video understanding in both laboratory and field studies. Using the EgoSchema dataset, ActiveEye reduces the front-end energy consumption by 49.14%, supporting 8.37 hours of continuous operation on a 2.1 Wh battery. It achieves the highest F1 score (0.80) and the lowest average time difference (1.30 s) compared with heuristic-based event detection algorithms, validating its timely semantic detection. Furthermore, ActiveEye achieves a visual question answering (VQA) accuracy of 61.6%, which is comparable to state-of-the-art VLM agents, despite their reliance on larger language decoders and more computationally intensive frame selection strategies. Two rounds of in-field user evaluations further confirm its effectiveness in real-world settings, demonstrating its practical viability as a continuous and responsive video understanding system, conversational assistant, and wearable companion.
- Publication status:
- Published
- Peer review status:
- Peer reviewed
Actions
Access Document
- Files:
-
-
(Preview, Accepted manuscript, pdf, 10.5MB, Terms of use)
-
- Publisher copy:
- 10.1145/3770641
Authors
+ Department for Science, Innovation and Technology
More from this funder
- Funder identifier:
- https://ror.org/028z36n30
- Grant:
- K250071-101
- Publisher:
- Association for Computing Machinery
- Journal:
- Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies More from this journal
- Volume:
- 9
- Issue:
- 4
- Pages:
- 1-33
- Article number:
- 228
- Publication date:
- 2025-12-02
- Acceptance date:
- 2025-09-19
- DOI:
- EISSN:
-
2474-9567
- Language:
-
English
- Keywords:
- Pubs id:
-
2348944
- Local pid:
-
pubs:2348944
- Deposit date:
-
2026-03-09
- ARK identifier:
Terms of use
- Copyright holder:
- Xu et al.
- Copyright date:
- 2025
- Rights statement:
- © 2025 Copyright held by the owner/author(s). Publication rights licensed to ACM.
- Notes:
- The author accepted manuscript (AAM) of this paper has been made available under the University of Oxford's Open Access Publications Policy, and a CC BY public copyright licence has been applied.
- Licence:
- CC Attribution (CC BY)
If you are the owner of this record, you can report an update to it here: Report update to this record