Conference item icon

Conference item

Olympus: a universal task router for computer vision tasks

Abstract:
We introduce Olympus, a new approach that transforms Multimodal Large Language Models (MLLMs) into a unified framework capable of handling a wide array of computer vision tasks. Utilizing a controller MLLM, Olympus delegates over 20 specialized tasks across images, videos, and 3D objects to dedicated modules. This instruction-based routing enables complex workflows through chained actions without the need for training heavy generative models. Olympus easily integrates with existing MLLMs, expanding their capabilities with comparable performance. Experimental results demonstrate that Olympus achieves an average routing accuracy of 94.75% across 20 tasks and precision of 91.82% in chained action scenarios, showcasing its effectiveness.
Publication status:
Accepted
Peer review status:
Peer reviewed

Actions


Access Document


Files:

Authors


More by this author
Institution:
University of Oxford
Division:
MPLS
Department:
Engineering Science
Role:
Author
More by this author
Institution:
University of Oxford
Division:
MPLS
Department:
Computer Science
Role:
Author


Publisher:
IEEE
Acceptance date:
2025-02-27
Event title:
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2025)
Event location:
Nashville, Tennessee, USA
Event website:
https://cvpr.thecvf.com/Conferences/2025
Event start date:
2025-06-11
Event end date:
2025-06-15


Language:
English
Pubs id:
2279675
Local pid:
pubs:2279675
Deposit date:
2025-08-11

Terms of use



Views and Downloads






If you are the owner of this record, you can report an update to it here: Report update to this record

TO TOP