Conference item
Made to order: discovering monotonic temporal changes via self-supervised video ordering
- Abstract:
- Our objective is to discover and localize monotonic temporal changes in a sequence of images. To achieve this, we exploit a simple proxy task of ordering a shuffled image sequence, with ‘time’ serving as a supervisory signal, since only changes that are monotonic with time can give rise to the correct ordering. We also introduce a transformerbased model for ordering of image sequences of arbitrary length with built-in attribution maps. After training, the model successfully discovers and localizes monotonic changes while ignoring cyclic and stochastic ones. We demonstrate applications of the model in multiple domains covering different scene and object types, discovering both object-level and environmental changes in unseen sequences. We also demonstrate that the attention-based attribution maps function as effective prompts for segmenting the changing regions, and that the learned representations can be used for downstream applications. Finally, we show that the model achieves the state-of-the-art on standard benchmarks for image ordering.
- Publication status:
- Published
- Peer review status:
- Peer reviewed
Actions
Authors
- Publisher:
- Springer
- Host title:
- 18th European Conference, Milan, Italy, September 29 – October 4, 2024, Proceedings, Part LXXIV
- Pages:
- 268–286
- Series:
- Lecture Notes in Computer Science
- Series number:
- 15132
- Publication date:
- 2024-11-21
- Acceptance date:
- 2024-07-01
- Event title:
- 18th European Conference on Computer Vision (ECCV 2024)
- Event location:
- Milan, Italy
- Event website:
- https://eccv.ecva.net/
- Event start date:
- 2024-09-29
- Event end date:
- 2024-10-04
- DOI:
- EISSN:
-
1611-3349
- ISSN:
-
0302-9743
- EISBN:
- 978-3-031-72904-1
- ISBN:
- 978-3-031-72903-4
- Language:
-
English
- Keywords:
- Pubs id:
-
2039688
- Local pid:
-
pubs:2039688
- Deposit date:
-
2024-10-17
Terms of use
- Copyright holder:
- Yang et al.
- Copyright date:
- 2025
- Rights statement:
- © 2025 The Author(s), under exclusive license to Springer Nature Switzerland AG
- Notes:
- This paper was presented at the 18th European Conference on Computer Vision (ECCV 2024), 29th September - 4th October 2024, Milan, Italy.
If you are the owner of this record, you can report an update to it here: Report update to this record