Conference item
Chirality in action: time-aware video representation learning by latent straightening
- Abstract:
-
Our objective is to develop compact video representations that are sensitive to visual change over time. To measure such time-sensitivity, we introduce a new task: chiral action recognition, where one needs to distinguish between a pair of temporally opposite actions, such as “opening vs. closing a door", “approaching vs. moving away from something", “folding vs. unfolding paper", etc. Such actions (i) occur frequently in everyday life, (ii) require understanding of simple visual change over time (in object state, size, spatial position, count . . . ), and (iii) are known to be poorly represented by many video embeddings. Our goal is to build time aware video representations which offer linear separability between these chiral pairs. To that end, we propose a self-supervised adaptation recipe to inject time-sensitivity into a sequence of frozen image features. Our model is based on an auto-encoder with a latent space with inductive bias inspired by perceptual straightening. We show that this results in a compact but time-sensitive video representation for the proposed task across three datasets: Something-Something, EPIC-Kitchens, and Charade. Our method (i) outperforms much larger video models pre-trained on large-scale video datasets, and (ii) leads to an improvement in classification performance on standard benchmarks when combined with these existing models.
- Publication status:
- Published
- Peer review status:
- Peer reviewed
Actions
Access Document
- Files:
-
-
(Preview, Accepted manuscript, pdf, 9.1MB, Terms of use)
-
Authors
- Funder identifier:
- https://ror.org/0439y7842
- Grant:
- EP/T028572/1
- Publisher:
- Neural Information Processing Systems Foundation
- Host title:
- Advances in Neural Information Processing Systems 38
- Publication date:
- 2026-05-01
- Acceptance date:
- 2025-09-18
- Event title:
- 39th Annual Conference on Neural Information Processing Systems (NeurIPS 2025)
- Event location:
- San Diego, California, USA & Mexico City, Mexico
- Event website:
- https://neurips.cc/Conferences/2025
- Event start date:
- 2025-11-30
- Event end date:
- 2025-12-05
- Language:
-
English
- Pubs id:
-
2299879
- Local pid:
-
pubs:2299879
- Deposit date:
-
2025-10-15
- ARK identifier:
Terms of use
- Copyright holder:
- Bagad and Zisserman
- Copyright date:
- 2026
- Rights statement:
- © (2026) by individual authors and Neural Information Processing Systems Foundation Inc. All rights reserved.
- Notes:
- The author accepted manuscript (AAM) of this paper has been made available under the University of Oxford's Open Access Publications Policy, and a CC BY public copyright licence has been applied.
- Licence:
- CC Attribution (CC BY)
If you are the owner of this record, you can report an update to it here: Report update to this record