Conference item
Self-supervised learning of class embeddings from video
- Abstract:
- This work explores how to use self-supervised learning on videos to learn a class-specific image embedding that encodes pose and shape information in the form of landmarks. At train time, two frames of the same video of an object class (e.g. human upper body) are extracted and each encoded to an embedding. Conditioned on these embeddings, the decoder network is tasked to transform one frame into another. To successfully perform long range transformations (e.g. a wrist lowered in one image should be mapped to the same wrist raised in another), we introduce a new hierarchical probabilistic network decoder model. Once trained, the embedding can be used for a variety of downstream tasks and domains. We demonstrate our approach quantitatively on three distinct deformable object classes - human full bodies, upper bodies, faces - and show experimentally that the learned embeddings do indeed generalise. They achieve state-of-the-art performance in comparison to other self-supervised methods trained on the same datasets, and approach the performance of fully supervised methods.
- Publication status:
- Published
- Peer review status:
- Peer reviewed
Actions
Access Document
- Files:
-
-
(Preview, Accepted manuscript, pdf, 4.2MB, Terms of use)
-
- Publisher copy:
- 10.1109/ICCVW.2019.00364
Authors
- Publisher:
- IEEE
- Publication date:
- 2020-03-05
- Acceptance date:
- 2019-08-26
- Event title:
- International Conference on Computer Vision 2019 (ICCV 2019)
- Event location:
- Seoul, South Korea
- Event website:
- http://iccv2019.thecvf.com/
- Event start date:
- 2019-10-27
- Event end date:
- 2019-11-02
- DOI:
- EISSN:
-
2473-9944
- ISSN:
-
2473-9936
- EISBN:
- 9781728150239
- ISBN:
- 9781728150246
- Language:
-
English
- Keywords:
- Pubs id:
-
pubs:1078019
- UUID:
-
uuid:0e2cb9fe-4524-44d7-9491-4907a4938bf1
- Local pid:
-
pubs:1078019
- Source identifiers:
-
1078019
- Deposit date:
-
2019-12-16
Terms of use
- Copyright holder:
- IEEE
- Copyright date:
- 2020
- Rights statement:
- © 2019 IEEE.
- Notes:
- This paper was presented at the International Conference on Computer Vision 2019 (ICCV 2019), Seoul, South Korea, October-November 2019. This is the accepted manuscript version of the article. The final version is available online from IEEE at: https://doi.org/10.1109/ICCVW.2019.00364
If you are the owner of this record, you can report an update to it here: Report update to this record