Conference item
Self-supervised learning of a facial attribute embedding from video
- Abstract:
- We propose a self-supervised framework for learning facial attributes by simply watching videos of a human face speaking, laughing, and moving over time. To perform this task, we introduce a network, Facial Attributes-Net (FAb-Net), that is trained to embed multiple frames from the same video face-track into a common low-dimensional space. With this approach, we make three contributions: first, we show that the network can leverage information from multiple source frames by predicting confidence/attention masks for each frame; second, we demonstrate that using a curriculum learning regime improves the learned embedding; finally, we demonstrate that the network learns a meaningful face embedding that encodes information about head pose, facial landmarks and facial expression – i.e. facial attributes – without having been supervised with any labelled data. We are comparable or superior to state-of-the-art self-supervised methods on these tasks and approach the performance of supervised methods.
- Publication status:
- Published
- Peer review status:
- Peer reviewed
Actions
Authors
- Publisher:
- British Machine Vision Association
- Host title:
- 29th British Machine Vision Conference (BMVC 2018)
- Journal:
- 29th British Machine Vision Conference (BMVC 2018) More from this journal
- Publication date:
- 2018-09-06
- Acceptance date:
- 2018-07-02
- Pubs id:
-
pubs:944867
- UUID:
-
uuid:2e5e0096-7ccb-4127-9125-7a93c4fa04ba
- Local pid:
-
pubs:944867
- Source identifiers:
-
944867
- Deposit date:
-
2018-11-21
Terms of use
- Copyright holder:
- Wiles
- Copyright date:
- 2018
- Notes:
- © 2018. The copyright of this document resides with its authors. It may be distributed unchanged freely in print or electronic forms.
If you are the owner of this record, you can report an update to it here: Report update to this record