Conference item icon

Conference item

Few-shot action recognition with permutation-invariant attention

Abstract:
Many few-shot learning models focus on recognising images. In contrast, we tackle a challenging task of few-shot action recognition from videos. We build on a C3D encoder for spatio-temporal video blocks to capture short-range action patterns. Such encoded blocks are aggregated by permutation-invariant pooling to make our approach robust to varying action lengths and long-range temporal dependencies whose patterns are unlikely to repeat even in clips of the same class. Subsequently, the pooled representations are combined into simple relation descriptors which encode so-called query and support clips. Finally, relation descriptors are fed to the comparator with the goal of similarity learning between query and support clips. Importantly, to re-weight block contributions during pooling, we exploit spatial and temporal attention modules and self-supervision. In naturalistic clips (of the same class) there exists a temporal distribution shift–the locations of discriminative temporal action hotspots vary. Thus, we permute blocks of a clip and align the resulting attention regions with similarly permuted attention regions of non-permuted clip to train the attention mechanism invariant to block (and thus long-term hotspot) permutations. Our method outperforms the state of the art on the HMDB51, UCF101, miniMIT datasets.
Publication status:
Published
Peer review status:
Peer reviewed

Actions

Access Document

Files:
Publisher copy:
10.1007/978-3-030-58558-7_31

Authors

More by this author
Institution:
University of Oxford
Division:
MPLS
Department:
Engineering Science
Role:
Author
More by this author
Institution:
University of Oxford
Division:
MPLS
Department:
Zoology
Role:
Author
More by this author
Institution:
University of Oxford
Division:
MPLS
Department:
Engineering Science
Role:
Author
More by this author
Institution:
University of Oxford
Division:
MPLS
Department:
Engineering Science
Role:
Author


Publisher:
Springer
Host title:
Proceedings of the European Conference on Computer Vision (ECCV 2020)
Journal:
Proceedings of the European Conference on Computer Vision (ECCV 2020) More from this journal
Volume:
12350
Pages:
525-542
Series:
Lecture Notes in Computer Science
Publication date:
2020-10-29
Event title:
European Conference on Computer Vision (ECCV), 2020
Event location:
Online
Event website:
https://eccv2020.eu/
Event start date:
2020-08-23
Event end date:
2020-08-28
DOI:
EISSN:
1611-3349
ISSN:
0302-9743
EISBN:
978-3-030-58558-7
ISBN:
9783030585570


Language:
English
Keywords:
Pubs id:
1150997
Local pid:
pubs:1150997
Deposit date:
2021-01-05
ARK identifier:

Terms of use


Views and Downloads






If you are the owner of this record, you can report an update to it here: Report update to this record

TO TOP