Conference item
Seeing wake words: Audio-visual keyword spotting
- Abstract:
-
The goal of this work is to automatically determine whether and when a word of interest is spoken by a talking face, with or without the audio. We propose a zero-shot method suitable for ‘in the wild’ videos. Our key contributions are: (1) a novel convolutional architecture, KWS-Net, that uses a similarity map intermediate representation to separate the task into (i) sequence matching, and (ii) pattern detection, to decide whether the word is there and when; (2) we demonstrate that if audio i...
Expand abstract
- Publication status:
- Published
- Peer review status:
- Peer reviewed
Actions
Authors
Funding
Bibliographic Details
- Publisher:
- British Machine Vision Association Publisher's website
- Publication date:
- 2020-09-07
- Acceptance date:
- 2020-07-29
- Event title:
- British Machine Vision Conference, 2020
- Event location:
- Virtual event
- Event website:
- http://www.bmvc2020.com/
- Event start date:
- 2020-09-07
- Event end date:
- 2020-09-10
Item Description
- Language:
- English
- Keywords:
- Pubs id:
-
1131235
- Local pid:
- pubs:1131235
- Deposit date:
- 2020-09-09
Terms of use
- Copyright holder:
- Momeni et al.
- Copyright date:
- 2020
- Rights statement:
- © 2020. The copyright of this document resides with its authors.
- Notes:
- Presented at the 31st British Machine Vision Virtual Conference : 7th - 10th September 2020.
Metrics
If you are the owner of this record, you can report an update to it here: Report update to this record