Conference item
ELIP: enhanced visual-language foundation models for image retrieval
- Abstract:
-
The objective in this paper is to improve the performance of text-to-image retrieval. To this end, we introduce a new framework that can boost the performance of large-scale pre-trained vision-language models, so that they can be used for text-to-image re-ranking. The approach, Enhanced LanguageImage Pre-training (ELIP), uses the text query, via a simple MLP mapping network, to predict a set of visual prompts to condition the ViT image encoding. ELIP can easily be applied to the commonly used CLIP, SigLIP and BLIP-2 networks. On the evaluation side, we set up two new out-of-distribution (OOD) benchmarks, Occluded COCO and ImageNet-R, to assess the zeroshot generalisation of the models to different domains. The results demonstrate that ELIP significantly boosts CLIP/SigLIP/SigLIP2 text-to-image retrieval performance and outperforms BLIP-2 on several benchmarks, as well as providing an easy means to adapt to OOD datasets.
- Publication status:
- Published
- Peer review status:
- Peer reviewed
Actions
Access Document
- Files:
-
-
(Preview, Accepted manuscript, pdf, 3.2MB, Terms of use)
-
- Publisher copy:
- 10.1109/CBMI66578.2025.11339290
Authors
- Funder identifier:
- https://ror.org/0439y7842
- Grant:
- EP/T028572/1
- Publisher:
- IEEE
- Publication date:
- 2026-01-20
- Acceptance date:
- 2025-09-15
- Event title:
- 22nd Conference on Content-Based Multimedia Indexing (CBMI 2025)
- Event location:
- Dublin, Ireland
- Event website:
- https://www.cbmi2025.org/
- Event start date:
- 2025-10-22
- Event end date:
- 2025-10-24
- DOI:
- Language:
-
English
- Keywords:
- Pubs id:
-
2300480
- Local pid:
-
pubs:2300480
- Deposit date:
-
2025-10-20
- ARK identifier:
Terms of use
- Copyright holder:
- IEEE
- Copyright date:
- 2026
- Rights statement:
- Copyright © 2025, IEEE
- Notes:
- The author accepted manuscript (AAM) of this conference paper has been made available under the University of Oxford's Open Access Publications Policy, and a CC BY public copyright licence has been applied.
- Licence:
- CC Attribution (CC BY)
If you are the owner of this record, you can report an update to it here: Report update to this record