Conference item
Out of time: automated lip sync in the wild
- Abstract:
-
The goal of this work is to determine the audio-video synchronisation between mouth motion and speech in a video.
We propose a two-stream ConvNet architecture that enables the mapping between the sound and the mouth images to be trained end-to-end from unlabelled data. The trained network is used to determine the lip-sync error in a video.
We apply the network to two further tasks: active speaker detection and lip reading. On both tasks we set a new state-of-the-art on standard benchmark datasets.
- Publication status:
- Published
- Peer review status:
- Peer reviewed
Actions
Access Document
- Files:
-
-
(Accepted manuscript, pdf, 3.2MB)
-
- Publisher copy:
- 10.1007/978-3-319-54427-4_19
Authors
Funding
Bibliographic Details
- Publisher:
- Springer Publisher's website
- Journal:
- 13th Asian Conference on Computer Vision Journal website
- Host title:
- Workshop on Multi-view Lip-reading, 13th Asian Conference on Computer Vision (ACCV 2016)
- Publication date:
- 2017-03-01
- Acceptance date:
- 2016-05-27
- Event location:
- Taipei
- DOI:
- Source identifiers:
-
656453
Item Description
- Pubs id:
-
pubs:656453
- UUID:
-
uuid:6bdd4768-6fbd-40ac-8efc-edca8a0325b3
- Local pid:
- pubs:656453
- Deposit date:
- 2016-11-01
Terms of use
- Copyright holder:
- Springer International Publishing AG
- Copyright date:
- 2017
- Notes:
- © Springer International Publishing AG 2017
If you are the owner of this record, you can report an update to it here: Report update to this record