Conference item icon

Conference item

Video representation learning by dense predictive coding

Abstract:
The objective of this paper is self-supervised learning of spatio-temporal embeddings from video, suitable for human action recognition. We make three contributions: First, we introduce the Dense Predictive Coding (DPC) framework for selfsupervised representation learning on videos. This learns a dense encoding of spatio-temporal blocks by recurrently predicting future representations; Second, we propose a curriculum training scheme to predict further into the future with progressively less temporal context. This encourages the model to only encode slowly varying spatialtemporal signals, therefore leading to semantic representations; Third, we evaluate the approach by first training the DPC model on the Kinetics-400 dataset with selfsupervised learning, and then finetuning the representation on a downstream task, i.e. action recognition. With single stream (RGB only), DPC pretrained representations achieve state-of-the-art self-supervised performance on both UCF101 (75.7% top1 acc) and HMDB51 (35.7% top1 acc), outperforming all previous learning methods by a significant margin, and approaching the performance of a baseline pre-trained on ImageNet. The code is available at https://github.com/TengdaHan/DPC.
Publication status:
Published
Peer review status:
Peer reviewed

Actions


Access Document


Authors


More by this author
Institution:
University of Oxford
Department:
Engineering Science
Role:
Author
More by this author
Institution:
University of Oxford
Department:
Engineering Science
Role:
Author
More by this author
Institution:
University of Oxford
Division:
MPLS
Department:
Engineering Science
Oxford college:
Brasenose College
Role:
Author
ORCID:
0000-0002-8945-8573


Publisher:
Computer Vision Foundation
Publication date:
2019-11-02
Acceptance date:
2019-08-15
Event title:
First Workshop on Large Scale Holistic Video Understanding
Event series:
IEEE International Conference on Computer Vision 2019
Event location:
Seoul, Korea
Event website:
https://holistic-video-understanding.github.io/workshops/iccv2019.html
Event start date:
2019-10-27
Event end date:
2019-11-02


Language:
English
Keywords:
Pubs id:
pubs:1060202
UUID:
uuid:16d379d6-776e-4a97-a440-5180a2d782a5
Local pid:
pubs:1060202
Source identifiers:
1060202
Deposit date:
2019-10-04

Terms of use



Views and Downloads






If you are the owner of this record, you can report an update to it here: Report update to this record

TO TOP