Conference item icon

Conference item

VCode: a multimodal coding benchmark with SVG as symbolic visual representation

Abstract:
Code has emerged as a precise, executable medium for linguistic-centric tasks, leaving visual-centric coding underexplored. Conventional image representations rely on RGB pixels that capture visual appearance but offer limited symbolic abstraction. In this work, we advocate SVG code as a compact, interpretable, and executable visual representation. We introduce VCode, a benchmark that reframes multimodal understanding as code generation: given an image, a model must produce SVG that preserves symbolic meaning for downstream reasoning. VCode covers general commonsense, professional disciplines, and visual-centric perception. To assess symbolic fidelity, we propose CodeVQA, a novel evaluation protocol where a policy model answers questions over rendered SVGs; correct answers indicate faithful symbolic preservation. We also introduce VCoder, an agentic framework that augments VLMs via test-time revision and visual tool use, yielding substantial improvements over strong baselines. The models are available at https://csu-jpg.github.io/VCode.
Publication status:
Accepted
Peer review status:
Peer reviewed

Actions

Access Document

Files:

Authors

More by this author
Institution:
University of Oxford
Division:
MPLS
Department:
Engineering Science
Role:
Author


Publisher:
IEEE
Acceptance date:
2026-04-24
Event title:
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2026)
Event location:
Denver, Colorado, USA
Event website:
https://cvpr.thecvf.com/Conferences/2026
Event start date:
2026-06-03
Event end date:
2026-06-07


Language:
English
Pubs id:
2433608
Local pid:
pubs:2433608
Deposit date:
2026-06-15
ARK identifier:

Terms of use


Views and Downloads






If you are the owner of this record, you can report an update to it here: Report update to this record

TO TOP