Conference item
Puppet-Master: scaling interactive video generation as a motion prior for part-level dynamics
- Abstract:
-
We introduce Puppet-Master, an interactive video generator that captures the internal, part-level motion of objects, serving as a proxy for modeling object dynamics universally. Given an image of an object and a set of "drags" specifying the trajectory of a few points on the object, the model synthesizes a video where the object's parts move accordingly. To build Puppet-Master, we extend a pre-trained image-to-video generator to encode the input drags. We also propose all-to-first attention, an alternative to conventional spatial attention that mitigates artifacts caused by fine-tuning a video generator on out-of-domain data. The model is fine-tuned on Objaverse-Animation-HQ, a new dataset of curated part-level motion clips obtained by rendering synthetic 3D animations. Unlike real videos, these synthetic clips avoid confounding part-level motion with overall object and camera motion. We extensively filter sub-optimal animations and augment the synthetic renderings with meaningful drags that emphasize the internal dynamics of objects. We demonstrate that Puppet-Master learns to generate part-level motions, unlike other motion-conditioned video generators that primarily move the object as a whole. Moreover, Puppet-Master generalizes well to out-of-domain real images, outperforming existing methods on real-world benchmarks in a zero-shot manner.
- Publication status:
- Accepted
- Peer review status:
- Peer reviewed
Actions
Authors
- Publisher:
- IEEE
- Acceptance date:
- 2025-07-23
- Event title:
- International Conference on Computer Vision (ICCV 2025)
- Event location:
- Honolulu, Hawai'i, USA
- Event website:
- https://iccv.thecvf.com/
- Event start date:
- 2025-10-19
- Event end date:
- 2025-10-23
- Language:
-
English
- Pubs id:
-
2300260
- Local pid:
-
pubs:2300260
- Deposit date:
-
2025-10-17
Terms of use
- Copyright date:
- 2025
- Notes:
-
This paper will be presented at the International Conference on Computer Vision (ICCV 2025), 19th-23rd October 2025, Honolulu, Hawai'i, USA.
The author accepted manuscript (AAM) of this paper has been made available under the University of Oxford's Open Access Publications Policy, and a CC BY public copyright licence has been applied.
- Licence:
- CC Attribution (CC BY)
If you are the owner of this record, you can report an update to it here: Report update to this record