Conference item
WISE: a multimodal search engine for visual scenes, audio, objects, faces, speech, and metadata
- Abstract:
- In this paper, we present WISE, an open-source audiovisual search engine which integrates a range of multimodal retrieval capabilities into a single practical tool, accessible to users without machine learning expertise. WISE supports natural-language and reverse-image queries at both the scene level (e.g. empty street) and object level (e.g. horse) across images and videos; face-based search for specific individuals; audio retrieval of acoustic events using text (e.g. wood creak) or an audio file; search over automatically transcribed speech; and filtering by user-provided metadata. Rich insights can be obtained by combining queries across modalities -- for example, retrieving German trains from a historical archive by applying the object query "train" and the metadata query "Germany", or searching for a face in a place. By employing vector search techniques, WISE can scale to support efficient retrieval over millions of images or thousands of hours of video. Its modular architecture facilitates the integration of new audio or visual models. WISE can be deployed locally for private or sensitive collections, and has been applied to a number of disparate real-world use cases. Code is available at https://gitlab.com/vgg/wise/wise
- Publication status:
- Accepted
- Peer review status:
- Peer reviewed
Actions
Access Document
- Files:
-
-
(Preview, Version of record, pdf, 5.0MB, Terms of use)
-
Authors
+ Engineering and Physical Sciences Research Council
More from this funder
- Funder identifier:
- https://ror.org/0439y7842
- Grant:
- T028572/1
- Publisher:
- Association for Computing Machinery
- Acceptance date:
- 2026-04-03
- Event title:
- 49th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2026)
- Event location:
- Melbourne, Australia
- Event website:
- https://sigir2026.org/
- Event start date:
- 2026-07-20
- Event end date:
- 2026-07-24
- Language:
-
English
- Keywords:
- Pubs id:
-
2433903
- Local pid:
-
pubs:2433903
- Source identifiers:
-
W7163553692
- Deposit date:
-
2026-06-16
- ARK identifier:
Terms of use
- Copyright holder:
- Sridhar et al.
- Copyright date:
- 2026
- Rights statement:
- © 2026 Copyright held by the owner/author(s). This work is licensed under a Creative Commons Attribution 4.0 International License.
- Notes:
- This conference paper has been accepted for presentation at the 49th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2026).
- Licence:
- CC Attribution (CC BY)
If you are the owner of this record, you can report an update to it here: Report update to this record