Video representation learning by dense predictive coding

Han, T; Xie, W; Zisserman, A

Conference item

Video representation learning by dense predictive coding

Abstract:: The objective of this paper is self-supervised learning of spatio-temporal embeddings from video, suitable for human action recognition. We make three contributions: First, we introduce the Dense Predictive Coding (DPC) framework for selfsupervised representation learning on videos. This learns a dense encoding of spatio-temporal blocks by recurrently predicting future representations; Second, we propose a curriculum training scheme to predict further into the future with progressively less temporal context. This encourages the model to only encode slowly varying spatialtemporal signals, therefore leading to semantic representations; Third, we evaluate the approach by first training the DPC model on the Kinetics-400 dataset with selfsupervised learning, and then finetuning the representation on a downstream task, i.e. action recognition. With single stream (RGB only), DPC pretrained representations achieve state-of-the-art self-supervised performance on both UCF101 (75.7% top1 acc) and HMDB51 (35.7% top1 acc), outperforming all previous learning methods by a significant margin, and approaching the performance of a baseline pre-trained on ImageNet. The code is available at https://github.com/TengdaHan/DPC.

Publication status:: Published

Peer review status:: Peer reviewed

Actions

Email

Email this record

Send the bibliographic details of this record to your email address.

Your Email
Please enter the email address that the record information will be sent to.

-
Your message (optional)
Please add any additional information to be included within the email.
Cite

Cite this record

APA Style

Han, T., Xie, W., & Zisserman, A. (2019). Video representation learning by dense predictive coding.

MLA Style

Han, T., et al. Video Representation Learning by Dense Predictive Coding. Computer Vision Foundation, 2019.

Chicago Style

Han, T, W Xie, and A Zisserman. 2019. “Video Representation Learning by Dense Predictive Coding.” In . Computer Vision Foundation.
Share
Print

Access Document

Files:: Han_et_al_Video_representation_learning.pdf

(Preview, Version of record, 665.8KB, Terms of use)

Publication website:: http://openaccess.thecvf.com/content_ICCVW_2019/html/HVU/Han_Video_Representation_Learning_by_Dense_Predictive_Coding_ICCVW_2019_paper.html

Authors

+ Han, T More by this author

Institution:: University of Oxford
Department:: Engineering Science
Role:: Author

+ Xie, W More by this author

Institution:: University of Oxford
Department:: Engineering Science
Role:: Author

+ Zisserman, A More by this author

Institution:: University of Oxford
Division:: MPLS
Department:: Engineering Science
Oxford college:: Brasenose College
Role:: Author
ORCID:: 0000-0002-8945-8573

+ Engineering & Physical Sciences Research Council More from this funder

Grant:: EP/M013774/1

Publisher:: Computer Vision Foundation
Publication date:: 2019-11-02
Acceptance date:: 2019-08-15
Event title:: First Workshop on Large Scale Holistic Video Understanding
Event series:: IEEE International Conference on Computer Vision 2019
Event location:: Seoul, Korea
Event website:: https://holistic-video-understanding.github.io/workshops/iccv2019.html
Event start date:: 2019-10-27
Event end date:: 2019-11-02

Language:: English
Keywords:: FFR
Pubs id:: pubs:1060202
UUID:: uuid:16d379d6-776e-4a97-a440-5180a2d782a5
Local pid:: pubs:1060202
Source identifiers:: 1060202
Deposit date:: 2019-10-04

Terms of use

Copyright holder:: Han et al.
Notes:: This paper was presented at the First International Workshop on Large Scale Holistic Video Understanding, part of the International Conference on Computer Vision 2019, Seoul, South Korea, October-November 2019. This is the publisher's version of the paper, provided Open Access by the Computer Vision Foundation.

Licence:: CC Attribution (CC BY)

Views and Downloads

About views and downloads

If you are the owner of this record, you can report an update to it here: Report update to this record

Conference item

Video representation learning by dense predictive coding

Actions

Access Document

Authors

Terms of use

Views and Downloads

Altmetrics

Dimensions

Conference item

Video representation learning by dense predictive coding

Actions

Access Document

Authors

Funding

Bibliographic Details

Item Description

Terms of use

Metrics

Views and Downloads

Altmetrics

Dimensions