Conference item
Voicevector: multimodal enrolment vectors for speaker separation
- Abstract:
- We present a transformer-based architecture for voice separation of a target speaker from multiple other speakers and ambient noise. We achieve this by using two separate neural networks: (A) An enrolment network designed to craft speakerspecific embeddings, exploiting various combinations of audio and visual modalities; and (B) A separation network that accepts both the noisy signal and enrolment vectors as inputs, outputting the clean signal of the target speaker. The novelties are: (i) the enrolment vector can be produced from: audio only, audio-visual data (using lip movements), or visual data alone (using lip movements from silent video); and (ii) the flexibility in conditioning the separation on multiple positive and negative enrolment vectors. We compare to previous methods and obtain superior performance
- Publication status:
- Published
- Peer review status:
- Peer reviewed
Actions
Access Document
- Files:
-
-
(Preview, Accepted manuscript, pdf, 407.2KB, Terms of use)
-
- Publisher copy:
- 10.1109/ICASSPW62465.2024.10627309
Authors
- Publisher:
- IEEE
- Host title:
- Proceeding of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2024)
- Pages:
- 785-789
- Publication date:
- 2024-08-15
- Acceptance date:
- 2024-04-14
- Event title:
- International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2024)
- Event location:
- COEX, Seoul, South Korea
- Event website:
- https://2024.ieeeicassp.org/
- Event start date:
- 2024-04-14
- Event end date:
- 2024-04-19
- DOI:
- EISBN:
- 979-8-3503-7451-3
- ISBN:
- 979-8-3503-7452-0
- Language:
-
English
- Keywords:
- Pubs id:
-
1996107
- Local pid:
-
pubs:1996107
- Deposit date:
-
2024-05-14
Terms of use
- Copyright holder:
- IEEE
- Copyright date:
- 2024
- Rights statement:
- © IEEE 2024
- Notes:
- This paper was presented at the International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2024). 14th-19th April 2024, COEX, Seoul, South Korea. This is the accepted manuscript version of the article. The final version is available online from IEEE at: https://dx.doi.org/ 10.1109/ICASSPW62465.2024.10627309
If you are the owner of this record, you can report an update to it here: Report update to this record