Conference item icon

Conference item

WISE: a multimodal search engine for visual scenes, audio, objects, faces, speech, and metadata

Abstract:
In this paper, we present WISE, an open-source audiovisual search engine which integrates a range of multimodal retrieval capabilities into a single practical tool, accessible to users without machine learning expertise. WISE supports natural-language and reverse-image queries at both the scene level (e.g. empty street) and object level (e.g. horse) across images and videos; face-based search for specific individuals; audio retrieval of acoustic events using text (e.g. wood creak) or an audio file; search over automatically transcribed speech; and filtering by user-provided metadata. Rich insights can be obtained by combining queries across modalities -- for example, retrieving German trains from a historical archive by applying the object query "train" and the metadata query "Germany", or searching for a face in a place. By employing vector search techniques, WISE can scale to support efficient retrieval over millions of images or thousands of hours of video. Its modular architecture facilitates the integration of new audio or visual models. WISE can be deployed locally for private or sensitive collections, and has been applied to a number of disparate real-world use cases. Code is available at https://gitlab.com/vgg/wise/wise
Publication status:
Accepted
Peer review status:
Peer reviewed

Actions

Access Document

Files:

Authors

More by this author
Institution:
University of Oxford
Division:
MPLS
Department:
Engineering Science
Role:
Author
More by this author
Institution:
University of Oxford
Division:
MPLS
Department:
Engineering Science
Role:
Author
More by this author
Institution:
University of Oxford
Division:
MPLS
Department:
Engineering Science
Role:
Author
More by this author
Institution:
University of Oxford
Division:
MPLS
Department:
Engineering Science
Role:
Author
More by this author
Institution:
University of Oxford
Division:
MPLS
Department:
Engineering Science
Oxford college:
Wolfson College
Role:
Author
ORCID:
0000-0002-5455-3343


More from this funder
Funder identifier:
https://ror.org/0439y7842
Grant:
T028572/1


Publisher:
Association for Computing Machinery
Acceptance date:
2026-04-03
Event title:
49th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2026)
Event location:
Melbourne, Australia
Event website:
https://sigir2026.org/
Event start date:
2026-07-20
Event end date:
2026-07-24


Language:
English
Keywords:
Pubs id:
2433903
Local pid:
pubs:2433903
Source identifiers:
W7163553692
Deposit date:
2026-06-16
ARK identifier:

Terms of use


Views and Downloads






If you are the owner of this record, you can report an update to it here: Report update to this record

TO TOP