Conference item
Sparse in space and time: audio-visual synchronisation with trainable selectors
- Abstract:
-
The objective of this paper is audio-visual synchronisation of general videos ‘in the wild’. For such videos, the events that may be harnessed for synchronisation cues may be spatially small and may occur only infrequently during a many seconds-long video clip, i.e. the synchronisation signal is ‘sparse in space and time’. This contrasts with the case of synchronising videos of talking heads, where audio-visual correspondence is dense in both time and space. We make four contributions: (i) in order to handle longer temporal sequences required for sparse synchronisation signals, we design a multi-modal transformer model that employs ‘selectors’ to distil the long audio and visual streams into small sequences that are then used to predict the temporal offset between streams. (ii) We identify artefacts that can arise from the compression codecs used for audio and video and can be used by audio-visual models in training to artificially solve the synchronisation task. (iii) We curate a dataset with only sparse in time and space synchronisation signals; and (iv) the effectiveness of the proposed model is shown on both dense and sparse datasets quantitatively and qualitatively. Project page: v-iashin.github.io/SparseSync
- Publication status:
- Published
- Peer review status:
- Peer reviewed
Actions
Access Document
- Files:
-
-
(Preview, Version of record, pdf, 2.1MB, Terms of use)
-
- Publication website:
- https://bmvc2022.mpi-inf.mpg.de/395/
Authors
- Publisher:
- British Machine Vision Association
- Host title:
- 33rd British Machine Vision Conference Proceedings
- Article number:
- 395
- Publication date:
- 2022-11-24
- Acceptance date:
- 2022-09-30
- Event title:
- 33rd British Machine Vision Conference (BMVC 2022)
- Event location:
- London
- Event website:
- https://bmvc2022.org/
- Event start date:
- 2022-11-21
- Event end date:
- 2022-11-25
- Language:
-
English
- Keywords:
- Pubs id:
-
1315264
- Local pid:
-
pubs:1315264
- Deposit date:
-
2022-12-15
Terms of use
- Copyright holder:
- Iashin et al.
- Copyright date:
- 2022
- Rights statement:
- © 2022. The copyright of this document resides with its authors. It may be distributed unchanged freely in print or electronic forms.
If you are the owner of this record, you can report an update to it here: Report update to this record