Conference item icon

Conference item

Out of time: automated lip sync in the wild

Abstract:

The goal of this work is to determine the audio-video synchronisation between mouth motion and speech in a video.


We propose a two-stream ConvNet architecture that enables the mapping between the sound and the mouth images to be trained end-to-end from unlabelled data. The trained network is used to determine the lip-sync error in a video.


We apply the network to two further tasks: active speaker detection and lip reading. On both tasks we set a new state-of-the-art on standard benchmark datasets.

Publication status:
Published
Peer review status:
Peer reviewed
Version:
Accepted manuscript

Actions


Access Document


Files:
Publisher copy:
10.1007/978-3-319-54427-4_19

Authors


More by this author
Department:
Oxford, MPLS, Engineering Science
More by this author
Department:
Oxford, MPLS, Engineering Science
Publisher:
Springer Publisher's website
Publication date:
2017-03-05
Acceptance date:
2016-05-27
DOI:
Pubs id:
pubs:656453
URN:
uri:6bdd4768-6fbd-40ac-8efc-edca8a0325b3
UUID:
uuid:6bdd4768-6fbd-40ac-8efc-edca8a0325b3
Local pid:
pubs:656453

Terms of use


Metrics



If you are the owner of this record, you can report an update to it here: Report update to this record

TO TOP