Conference item
Localizing events in videos with multimodal queries
- Abstract:
- Localizing events in videos based on semantic queries is a pivotal task in video understanding research and user-oriented applications like video search. Yet, current research predominantly relies on natural language queries (NLQs), overlooking the potential of using multimodal queries (MQs) that incorporate images to flexibly represent semantic queries, particularly when it is difficult to express non-verbal or unfamiliar concepts in words. To bridge this gap, we introduce ICQ, a new benchmark designed for localizing events in videos with MQs, alongside an evaluation dataset ICQ-Highlight. To adapt and reevaluate existing video localization models for this new task, we propose 3 Multimodal Query Adaptation methods and a novel Surrogate Fine-Tuning strategy, serving as strong baseline methods. ICQ systematically benchmarks 12 state-of-the-art backbone models, spanning from specialized video localization models to Video Large Language Models. Our extensive experiments highlight the high potential of using MQs in real-world applications. We believe this is a first step toward video event localization with MQs.
- Publication status:
- Published
- Peer review status:
- Peer reviewed
Actions
Access Document
- Files:
-
-
(Preview, Accepted manuscript, pdf, 6.5MB, Terms of use)
-
- Publisher copy:
- 10.1109/CVPR52734.2025.00317
Authors
- Publisher:
- IEEE
- Host title:
- 2025 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
- Pages:
- 3339-3351
- Publication date:
- 2025-06-10
- Acceptance date:
- 2025-02-27
- Event title:
- IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2025)
- Event location:
- Nashville, TN, USA
- Event website:
- https://cvpr.thecvf.com/
- Event start date:
- 2025-06-11
- Event end date:
- 2025-06-15
- DOI:
- EISSN:
-
2575-7075
- ISSN:
-
1063-6919
- EISBN:
- 9798331543648
- ISBN:
- 9798331543655
- Language:
-
English
- Keywords:
- Pubs id:
-
2098086
- Local pid:
-
pubs:2098086
- Deposit date:
-
2025-03-24
Terms of use
- Copyright holder:
- IEEE
- Copyright date:
- 2025
- Rights statement:
- © 2025 IEEE.
- Notes:
- The author accepted manuscript (AAM) of this paper has been made available under the University of Oxford's Open Access Publications Policy, and a CC BY public copyright licence has been applied.
- Licence:
- CC Attribution (CC BY)
If you are the owner of this record, you can report an update to it here: Report update to this record