Conference item icon

Conference item

Seeing wake words: Audio-visual keyword spotting

Abstract:

The goal of this work is to automatically determine whether and when a word of interest is spoken by a talking face, with or without the audio. We propose a zero-shot method suitable for ‘in the wild’ videos. Our key contributions are: (1) a novel convolutional architecture, KWS-Net, that uses a similarity map intermediate representation to separate the task into (i) sequence matching, and (ii) pattern detection, to decide whether the word is there and when; (2) we demonstrate that if audio i...

Expand abstract
Publication status:
Published
Peer review status:
Peer reviewed

Actions


Access Document


Files:

Authors


More by this author
Department:
ENGINEERING SCIENCE
Sub department:
Engineering Science
Oxford college:
Brasenose College
Role:
Author
ORCID:
0000-0002-8945-8573
Publisher:
British Machine Vision Association Publisher's website
Publication date:
2020-09-07
Acceptance date:
2020-07-29
Event title:
British Machine Vision Conference, 2020
Event location:
Virtual event
Event website:
http://www.bmvc2020.com/
Event start date:
2020-09-07
Event end date:
2020-09-10
Language:
English
Keywords:
Pubs id:
1131235
Local pid:
pubs:1131235
Deposit date:
2020-09-09

Terms of use


Views and Downloads






If you are the owner of this record, you can report an update to it here: Report update to this record

TO TOP