Conference item
CounTR: transformer-based generalised visual counting
- Abstract:
- In this paper, we consider the problem of generalised visual object counting, with the goal of developing a computational model for counting the number of objects from arbitrary semantic categories, using arbitrary number of “exemplars”, i.e. zero-shot or few-shot counting. To this end, we make the following four contributions: (1) We introduce a novel transformer-based architecture for generalised visual object counting, termed as Counting TRansformer (CounTR), which explicitly capture the similarity between image patches or with given “exemplars” with the attention mechanism; (2) We adopt a two-stage training regime, that first pre-trains the model with self-supervised learning, and followed by supervised fine-tuning; (3) We propose a simple, scalable pipeline for synthesizing training images with a large number of instances or that from different semantic categories, explicitly forcing the model to make use of the given “exemplars”; (4) We conduct thorough ablation studies on the large-scale counting benchmark, e.g. FSC-147, and demonstrate state-of-the-art performance on both zero and few-shot settings. Project page: https://verg-avesta.github.io/CounTR_Webpage/.
- Publication status:
- Published
- Peer review status:
- Peer reviewed
Actions
Access Document
- Files:
-
-
(Preview, Version of record, pdf, 4.0MB, Terms of use)
-
- Publication website:
- https://bmvc2022.mpi-inf.mpg.de/370/
Authors
- Publisher:
- BMVA Press
- Host title:
- Proceedings of the 33rd British Machine Vision Conference (BMVC 2022)
- Article number:
- 370
- Publication date:
- 2022-11-25
- Acceptance date:
- 2022-09-30
- Event title:
- 33rd British Machine Vision Conference (BMVC 2022)
- Event location:
- London, UK
- Event website:
- https://bmvc2022.org/
- Event start date:
- 2022-11-21
- Event end date:
- 2022-11-24
- Language:
-
English
- Pubs id:
-
1315255
- Local pid:
-
pubs:1315255
- Deposit date:
-
2022-12-15
Terms of use
- Copyright holder:
- Liu et al.
- Copyright date:
- 2022
- Rights statement:
- © 2022. The copyright of this document resides with its authors. It may be distributed unchanged freely in print or electronic forms.
If you are the owner of this record, you can report an update to it here: Report update to this record