Self-supervised learning of structural representations of visual objects

Jakab, T

Thesis

Self-supervised learning of structural representations of visual objects

Abstract:: This thesis explores how a computer can learn the structure of visual objects in the absence of strong supervision using self-supervised learning. We demonstrate that we can learn structural representations of objects using an autoencoding framework with reconstruction as the key learning signal. We do this by engineering bottlenecks that disentangle object structure from other factors of variation. Moreover, we design the bottlenecks to represent the object structure in the form of 2D and 3D object landmarks or 3D mesh. Specifically, we develop a method that automatically discovers 2D object landmarks without any annotations using a conditional autoencoder with 2D keypoint bottleneck that disentangles pose, represented as 2D keypoints, and appearance. Despite the ability of self-supervised learning methods to learn stable object landmarks, the automatically discovered landmarks are not aligned with landmarks that would be annotated by human annotators. To address this, we present a method that can inject an unpaired empirical prior into a conditional autoencoder by introducing a novel landmark autoencoding that can leverage powerful image discriminators used in adversarial learning. A by-product of these conditional autoencoding methods is that the generation can be interactively controlled by manipulating the keypoints in the bottleneck. We leverage this feature in a novel method for interactive 3D shape deformation. The method is trained in a self-supervised way to use automatically discovered 3D landmarks to align pairs of 3D shapes. In the test time, the method allows the user to interactively deform the object shape via the discovered 3D object landmarks. Finally, we present a method that uses a photo-geometric autoencoder to recover 3D shape of an object category without any 3D annotations. It uses videos for training and learns to disentangle an image input into a rigid pose, texture and deformable shape model.

Actions

Email

Email this record

Send the bibliographic details of this record to your email address.

Your Email
Please enter the email address that the record information will be sent to.

-
Your message (optional)
Please add any additional information to be included within the email.
Cite

Cite this record

APA Style

Jakab, T. (2021). Self-supervised learning of structural representations of visual objects [PhD thesis]. University of Oxford.

MLA Style

Jakab, T. Self-Supervised Learning of Structural Representations of Visual Objects. University of Oxford, 2021.

Chicago Style

Jakab, T. 2021. “Self-Supervised Learning of Structural Representations of Visual Objects.” PhD thesis, University of Oxford.
Share
Print

Access Document

Files:: tomas_jakab_thesis.pdf

(Preview, Dissemination version, pdf, 48.0MB, Terms of use)

Authors

+ Jakab, T More by this author

Role:: Author

Contributors

+ Vedaldi, A

Role:: Supervisor

+ Clarendon Fund More from this funder

Funder identifier:: http://dx.doi.org/10.13039/501100014748
Funding agency for:: Jakab, T

DOI:: 10.5287/ora-brvqjgknr
Type of award:: DPhil
Level of award:: Doctoral
Awarding institution:: University of Oxford

Language:: English
Keywords:: keypoints

self-supervised

computer vision

unsupervised

3D reconstruction
Subjects:: Computer vision
Pubs id:: 2043072
Local pid:: pubs:2043072
Deposit date:: 2022-07-08

Terms of use

Copyright holder:: Tomas Jakab

Licence:: Terms and Conditions of Use for Oxford University Research Archive

Views and Downloads

About views and downloads

If you are the owner of this record, you can report an update to it here: Report update to this record

Thesis

Self-supervised learning of structural representations of visual objects

Actions

Access Document

Authors

Contributors

Terms of use

Views and Downloads

Altmetrics

Dimensions

Thesis

Self-supervised learning of structural representations of visual objects

Actions

Access Document

Authors

Contributors

Funding

Bibliographic Details

Item Description

Terms of use

Metrics

Views and Downloads

Altmetrics

Dimensions