Conference item icon

Conference item

The conversation: deep audio-visual speech enhancement

Abstract:

Our goal is to isolate individual speakers from multi-talker simultaneous speech in videos. Existing works in this area have focussed on trying to separate utterances from known speakers in controlled environments. In this paper, we propose a deep audio-visual speech enhancement network that is able to separate a speaker's voice given lip regions in the corresponding video, by predicting both the magnitude and the phase of the target signal. The method is applicable to speakers unheard and un...

Expand abstract
Publication status:
Published
Peer review status:
Reviewed (other)
Version:
Publisher's Version

Actions


Access Document


Files:
Publisher copy:
10.21437/Interspeech.2018-1400

Authors


Alfouras, T More by this author
More by this author
Institution:
University of Oxford
Division:
Mathematical Physical and Life Sciences
Department:
Engineering Science
More by this author
Institution:
University of Oxford
Division:
Mathematical Physical and Life Sciences
Department:
Engineering Science
Oxford college:
Brasenose College
Publisher:
International Speech Communication Association Publisher's website
Volume:
2018
Pages:
3244-3248
Publication date:
2018-09-02
Acceptance date:
2018-06-03
DOI:
ISSN:
1990-9772
Pubs id:
pubs:859243
URN:
uri:d04cc64a-7ae9-4a24-9804-f5b78235d543
UUID:
uuid:d04cc64a-7ae9-4a24-9804-f5b78235d543
Local pid:
pubs:859243

Terms of use


Metrics



If you are the owner of this record, you can report an update to it here: Report update to this record

TO TOP