Conference item icon

Conference item

Puppet-Master: scaling interactive video generation as a motion prior for part-level dynamics

Abstract:
We introduce Puppet-Master, an interactive video generator that captures the internal, part-level motion of objects, serving as a proxy for modeling object dynamics universally. Given an image of an object and a set of "drags" specifying the trajectory of a few points on the object, the model synthesizes a video where the object's parts move accordingly. To build Puppet-Master, we extend a pre-trained image-to-video generator to encode the input drags. We also propose all-to-first attention, an alternative to conventional spatial attention that mitigates artifacts caused by fine-tuning a video generator on out-of-domain data. The model is fine-tuned on Objaverse-Animation-HQ, a new dataset of curated part-level motion clips obtained by rendering synthetic 3D animations. Unlike real videos, these synthetic clips avoid confounding part-level motion with overall object and camera motion. We extensively filter sub-optimal animations and augment the synthetic renderings with meaningful drags that emphasize the internal dynamics of objects. We demonstrate that Puppet-Master learns to generate part-level motions, unlike other motion-conditioned video generators that primarily move the object as a whole. Moreover, Puppet-Master generalizes well to out-of-domain real images, outperforming existing methods on real-world benchmarks in a zero-shot manner.
Publication status:
Accepted
Peer review status:
Peer reviewed

Actions


Access Document


Files:

Authors


More by this author
Institution:
University of Oxford
Division:
MPLS
Department:
Engineering Science
Role:
Author
More by this author
Institution:
University of Oxford
Division:
MPLS
Department:
Engineering Science
Role:
Author
ORCID:
0000-0002-3584-9640
More by this author
Institution:
University of Oxford
Division:
MPLS
Department:
Computer Science
Role:
Author
More by this author
Institution:
University of Oxford
Division:
MPLS
Department:
Engineering Science
Oxford college:
New College
Role:
Author
ORCID:
0000-0003-1374-2858


Publisher:
IEEE
Acceptance date:
2025-07-23
Event title:
International Conference on Computer Vision (ICCV 2025)
Event location:
Honolulu, Hawai'i, USA
Event website:
https://iccv.thecvf.com/
Event start date:
2025-10-19
Event end date:
2025-10-23


Language:
English
Pubs id:
2300260
Local pid:
pubs:2300260
Deposit date:
2025-10-17

Terms of use



Views and Downloads






If you are the owner of this record, you can report an update to it here: Report update to this record

TO TOP