Conference item
Semantic-aware auto-encoders for self-supervised representation learning
- Abstract:
- The resurgence of unsupervised learning can be attributed to the remarkable progress of self-supervised learning, which includes generative $(\mathcal{G})$ and discriminative $(\mathcal{D})$ models. In computer vision, the mainstream self-supervised learning algorithms are $\mathcal{D}$ models. However, designing a $\mathcal{D}$ model could be over-complicated; also, some studies hinted that a $\mathcal{D}$ model might not be as general and interpretable as a $\mathcal{G}$ model. In this paper, we switch from $\mathcal{D}$ models to $\mathcal{G}$ models using the classical auto-encoder $(AE)$ . Note that a vanilla $\mathcal{G}$ model was far less efficient than a $\mathcal{D}$ model in self-supervised computer vision tasks, as it wastes model capability on overfitting semantic-agnostic high-frequency details. Inspired by perceptual learning that could use cross-view learning to perceive concepts and semantics 1 1 Following [26], we refer to semantics as visual concepts, e.g., a semantic-ware model indicates the model can perceive visual concepts, and the learned features are efficient in object recognition, detection, etc., we propose a novel $AE$ that could learn semantic-aware representation via cross-view image reconstruction. We use one view of an image as the input and another view of the same image as the reconstruction target. This kind of $AE$ has rarely been studied before, and the optimization is very difficult. To enhance learning ability and find a feasible solution, we propose a semantic aligner that uses geometric transformation knowledge to align the hidden code of $AE$ to help optimization. These techniques significantly improve the representation learning ability of $AE$ and make selfsupervised learning with $\mathcal{G}$ models possible. Extensive experiments on many large-scale benchmarks (e.g., ImageNet, COCO 2017, and SYSU-30k) demonstrate the effectiveness of our methods. Code is available at https://github.com/wanggrun/Semantic-Aware-AE.
- Publication status:
- Published
- Peer review status:
- Peer reviewed
Actions
Access Document
- Files:
-
-
(Preview, Accepted manuscript, pdf, 1.3MB, Terms of use)
-
- Publisher copy:
- 10.1109/cvpr52688.2022.00944
Authors
- Publisher:
- IEEE
- Host title:
- 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
- Pages:
- 9654-9665
- Publication date:
- 2022-09-27
- Acceptance date:
- 2022-03-02
- Event title:
- IEEE/CVF Computer Vision and Pattern Recognition Conference (CVPR 2022)
- Event location:
- New Orleans, Louisiana
- Event website:
- https://cvpr2022.thecvf.com/
- Event start date:
- 2022-06-21
- Event end date:
- 2022-06-24
- DOI:
- EISSN:
-
2575-7075
- ISSN:
-
1063-6919
- EISBN:
- 9781665469463
- ISBN:
- 9781665469470
- Language:
-
English
- Keywords:
- Pubs id:
-
1304012
- Local pid:
-
pubs:1304012
- Deposit date:
-
2022-11-14
Terms of use
- Copyright holder:
- IEEE
- Copyright date:
- 2022
- Rights statement:
- © 2022 IEEE.
- Notes:
- This is the accepted manuscript version of the paper. The final version is available online from IEEE at: https://doi.org/10.1109/CVPR52688.2022.00944
If you are the owner of this record, you can report an update to it here: Report update to this record