Conference item
What happens next? Anticipating future motion by generating point trajectories
- Abstract:
-
We consider the problem of forecasting motion from a single image, i.e., predicting how objects in the world are likely to move, without the ability to observe other parameters such as the object velocities or the forces applied to them. We formulate this task as conditional generation of dense trajectory grids with a model that closely follows the architecture of modern video generators but outputs motion trajectories instead of pixels. This approach captures scene-wide dynamics and uncertainty, yielding more accurate and diverse predictions than prior regressors and generators. Although recent state-of-the-art video generators are often regarded as world models, we show that they struggle with forecasting motion from a single image, even in simple physical scenarios such as falling blocks or mechanical object interactions, despite fine-tuning on such data. We show that this limitation arises from the overhead of generating pixels rather than directly modeling motion.
- Publication status:
- Published
- Peer review status:
- Peer reviewed
Actions
Access Document
- Files:
-
-
(Preview, Accepted manuscript, pdf, 16.3MB, Terms of use)
-
- Publication website:
- https://openreview.net/forum?id=t1vMYl1yhe
Authors
- Publisher:
- OpenReview
- Host title:
- Proceedings of the 14th International Conference on Learning Representations (ICLR 2026)
- Article number:
- 9975
- Publication date:
- 2026-01-26
- Acceptance date:
- 2026-01-26
- Event title:
- 14th International Conference on Learning Representations (ICLR 2026)
- Event location:
- Rio de Janeiro, Brazil
- Event website:
- https://iclr.cc/Conferences/2026
- Event start date:
- 2026-04-23
- Event end date:
- 2026-04-27
- Language:
-
English
- Pubs id:
-
2403471
- Local pid:
-
pubs:2403471
- Deposit date:
-
2026-04-08
- ARK identifier:
Terms of use
- Copyright holder:
- Boduljak et al.
- Copyright date:
- 2026
- Rights statement:
- © The Authors 2026.
- Notes:
- The author accepted manuscript (AAM) of this paper has been made available under the University of Oxford's Open Access Publications Policy, and a CC BY public copyright licence has been applied.
- Licence:
- CC Attribution (CC BY)
If you are the owner of this record, you can report an update to it here: Report update to this record