Conference item
Few-shot action recognition with permutation-invariant attention
- Abstract:
- Many few-shot learning models focus on recognising images. In contrast, we tackle a challenging task of few-shot action recognition from videos. We build on a C3D encoder for spatio-temporal video blocks to capture short-range action patterns. Such encoded blocks are aggregated by permutation-invariant pooling to make our approach robust to varying action lengths and long-range temporal dependencies whose patterns are unlikely to repeat even in clips of the same class. Subsequently, the pooled representations are combined into simple relation descriptors which encode so-called query and support clips. Finally, relation descriptors are fed to the comparator with the goal of similarity learning between query and support clips. Importantly, to re-weight block contributions during pooling, we exploit spatial and temporal attention modules and self-supervision. In naturalistic clips (of the same class) there exists a temporal distribution shift–the locations of discriminative temporal action hotspots vary. Thus, we permute blocks of a clip and align the resulting attention regions with similarly permuted attention regions of non-permuted clip to train the attention mechanism invariant to block (and thus long-term hotspot) permutations. Our method outperforms the state of the art on the HMDB51, UCF101, miniMIT datasets.
- Publication status:
- Published
- Peer review status:
- Peer reviewed
Actions
Access Document
- Files:
-
-
(Preview, Accepted manuscript, pdf, 3.8MB, Terms of use)
-
- Publisher copy:
- 10.1007/978-3-030-58558-7_31
Authors
- Publisher:
- Springer
- Host title:
- Proceedings of the European Conference on Computer Vision (ECCV 2020)
- Journal:
- Proceedings of the European Conference on Computer Vision (ECCV 2020) More from this journal
- Volume:
- 12350
- Pages:
- 525-542
- Series:
- Lecture Notes in Computer Science
- Publication date:
- 2020-10-29
- Event title:
- European Conference on Computer Vision (ECCV), 2020
- Event location:
- Online
- Event website:
- https://eccv2020.eu/
- Event start date:
- 2020-08-23
- Event end date:
- 2020-08-28
- DOI:
- EISSN:
-
1611-3349
- ISSN:
-
0302-9743
- EISBN:
- 978-3-030-58558-7
- ISBN:
- 9783030585570
- Language:
-
English
- Keywords:
- Pubs id:
-
1150997
- Local pid:
-
pubs:1150997
- Deposit date:
-
2021-01-05
- ARK identifier:
Terms of use
- Copyright holder:
- Springer
- Copyright date:
- 2020
- Rights statement:
- © Springer Nature Switzerland AG 2020
- Notes:
- This paper was presented at the European Conference in Computer Vision (ECCV 2020), 23rd - 28th August 2020. This is the accepted manuscript version of the article. The final version is available from Springer at: https://doi.org/10.1007/978-3-030-58558-7_31
If you are the owner of this record, you can report an update to it here: Report update to this record