Conference item icon

Conference item

Out of time: automated lip sync in the wild

Abstract:

The goal of this work is to determine the audio-video synchronisation between mouth motion and speech in a video.


We propose a two-stream ConvNet architecture that enables the mapping between the sound and the mouth images to be trained end-to-end from unlabelled data. The trained network is used to determine the lip-sync error in a video.


We apply the network to two further tasks: active speaker detection and lip reading. On both tasks we set a new state-of-the-art on standard benchmark datasets.

Publication status:
Published
Peer review status:
Peer reviewed

Actions


Access Document


Authors


More by this author
Institution:
University of Oxford
Division:
MPLS
Department:
Engineering Science
Role:
Author
More by this author
Institution:
University of Oxford
Division:
MPLS
Department:
Engineering Science
Role:
Author
More from this funder
Name:
Engineering and Physical Sciences Research Council
Grant:
EP/M013774/1
Publisher:
Springer
Host title:
Workshop on Multi-view Lip-reading, 13th Asian Conference on Computer Vision (ACCV 2016)
Journal:
13th Asian Conference on Computer Vision More from this journal
Publication date:
2017-03-01
Acceptance date:
2016-05-27
Event location:
Taipei
DOI:
Pubs id:
pubs:656453
UUID:
uuid:6bdd4768-6fbd-40ac-8efc-edca8a0325b3
Local pid:
pubs:656453
Source identifiers:
656453
Deposit date:
2016-11-01

Terms of use


Views and Downloads






If you are the owner of this record, you can report an update to it here: Report update to this record

TO TOP