Conference item
Personalizing retrieval using joint embeddings; or "The Return of Fluffy"
- Abstract:
- The goal of this paper is to be able to retrieve images using a compound query that combines object instance information from an image, with a natural text description of what that object is doing or where it is. For example, to retrieve an image of ‘Fluffy the unicorn (specified by an image) on someone’s head’. To achieve this we design a mapping network that can ‘translate’ from a local image embedding (of the object instance) to a text token, such that the combination of the token and a natural language query is suitable for CLIP style text encoding, and image retrieval. Generating a text token in this manner involves a simple training procedure, that only needs to be performed once for each object instance. We show that our approach of using a trainable mapping network, termed π-map, together with frozen CLIP text and image encoders, improves the state of the art on two benchmarks designed to assess personalized retrieval.
- Publication status:
- Published
- Peer review status:
- Peer reviewed
Actions
Access Document
- Files:
-
-
(Preview, Accepted manuscript, pdf, 12.5MB, Terms of use)
-
- Publisher copy:
- 10.1109/CBMI66578.2025.11339319
Authors
- Publisher:
- IEEE
- Host title:
- 2025 International Conference on Content-Based Multimedia Indexing (CBMI)
- Pages:
- 1-8
- Publication date:
- 2026-01-20
- Acceptance date:
- 2025-09-15
- Event title:
- 22nd International Conference on Content-based Multimedia Indexing (CBMI 2025)
- Event location:
- Dublin, Ireland
- Event website:
- https://www.robots.ox.ac.uk/~vgg/publications/2025/Korbar25/korbar25.pdf
- Event start date:
- 2025-10-22
- Event end date:
- 2025-10-24
- DOI:
- EISBN:
- 9798331555009
- ISBN:
- 9798331555016
- Language:
-
English
- Keywords:
- Pubs id:
-
2320701
- Local pid:
-
pubs:2320701
- Deposit date:
-
2025-11-10
- ARK identifier:
Terms of use
- Copyright holder:
- IEEE
- Copyright date:
- 2026
- Rights statement:
- ©️ IEEE 2026
- Notes:
- This paper was presented at the 22nd International Conference on Content-based Multimedia Indexing (CBMI 2025), 22nd-24th October 2025, Dublin, Ireland. The author accepted manuscript (AAM) of this paper has been made available under the University of Oxford's Open Access Publications Policy, and a CC BY public copyright licence has been applied.
- Licence:
- CC Attribution (CC BY)
If you are the owner of this record, you can report an update to it here: Report update to this record