Voicevector: multimodal enrolment vectors for speaker separation

Conference item

Abstract:: We present a transformer-based architecture for voice separation of a target speaker from multiple other speakers and ambient noise. We achieve this by using two separate neural networks: (A) An enrolment network designed to craft speakerspecific embeddings, exploiting various combinations of audio and visual modalities; and (B) A separation network that accepts both the noisy signal and enrolment vectors as inputs, outputting the clean signal of the target speaker. The novelties are: (i) the enrolment vector can be produced from: audio only, audio-visual data (using lip movements), or visual data alone (using lip movements from silent video); and (ii) the flexibility in conditioning the separation on multiple positive and negative enrolment vectors. We compare to previous methods and obtain superior performance

Files:: Rahimi_et_al_2024_Voicevector_multimodal_enrolment.pdf

(Preview, Accepted manuscript, pdf, 407.2KB, Terms of use)

Publisher:: IEEE
Host title:: Proceeding of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2024)
Pages:: 785-789
Publication date:: 2024-08-15
Acceptance date:: 2024-04-14
Event title:: International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2024)
Event location:: COEX, Seoul, South Korea
Event website:: https://2024.ieeeicassp.org/
Event start date:: 2024-04-14
Event end date:: 2024-04-19
DOI:: 10.1109/ICASSPW62465.2024.10627309
EISBN:: 979-8-3503-7451-3
ISBN:: 979-8-3503-7452-0

Copyright holder:: IEEE
Notes:: This paper was presented at the International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2024). 14th-19th April 2024, COEX, Seoul, South Korea. This is the accepted manuscript version of the article. The final version is available online from IEEE at: https://dx.doi.org/ 10.1109/ICASSPW62465.2024.10627309

Licence:: Terms and Conditions of Use for Oxford University Research Archive

If you are the owner of this record, you can report an update to it here: Report update to this record