Conference item
Olympus: a universal task router for computer vision tasks
- Abstract:
- We introduce Olympus, a new approach that transforms Multimodal Large Language Models (MLLMs) into a unified framework capable of handling a wide array of computer vision tasks. Utilizing a controller MLLM, Olympus delegates over 20 specialized tasks across images, videos, and 3D objects to dedicated modules. This instruction-based routing enables complex workflows through chained actions without the need for training heavy generative models. Olympus easily integrates with existing MLLMs, expanding their capabilities with comparable performance. Experimental results demonstrate that Olympus achieves an average routing accuracy of 94.75% across 20 tasks and precision of 91.82% in chained action scenarios, showcasing its effectiveness.
- Publication status:
- Accepted
- Peer review status:
- Peer reviewed
Actions
Authors
- Publisher:
- IEEE
- Acceptance date:
- 2025-02-27
- Event title:
- IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2025)
- Event location:
- Nashville, Tennessee, USA
- Event website:
- https://cvpr.thecvf.com/Conferences/2025
- Event start date:
- 2025-06-11
- Event end date:
- 2025-06-15
- Language:
-
English
- Pubs id:
-
2279675
- Local pid:
-
pubs:2279675
- Deposit date:
-
2025-08-11
Terms of use
- Copyright date:
- 2025
- Notes:
-
This paper was presented at the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2025), 11-15 June 2025, Nashville, Tennessee, USA.
The author accepted manuscript (AAM) of this paper has been made available under the University of Oxford's Open Access Publications Policy, and a CC BY public copyright licence has been applied.
- Licence:
- CC Attribution (CC BY)
If you are the owner of this record, you can report an update to it here: Report update to this record