Conference item
VCode: a multimodal coding benchmark with SVG as symbolic visual representation
- Abstract:
- Code has emerged as a precise, executable medium for linguistic-centric tasks, leaving visual-centric coding underexplored. Conventional image representations rely on RGB pixels that capture visual appearance but offer limited symbolic abstraction. In this work, we advocate SVG code as a compact, interpretable, and executable visual representation. We introduce VCode, a benchmark that reframes multimodal understanding as code generation: given an image, a model must produce SVG that preserves symbolic meaning for downstream reasoning. VCode covers general commonsense, professional disciplines, and visual-centric perception. To assess symbolic fidelity, we propose CodeVQA, a novel evaluation protocol where a policy model answers questions over rendered SVGs; correct answers indicate faithful symbolic preservation. We also introduce VCoder, an agentic framework that augments VLMs via test-time revision and visual tool use, yielding substantial improvements over strong baselines. The models are available at https://csu-jpg.github.io/VCode.
- Publication status:
- Accepted
- Peer review status:
- Peer reviewed
Actions
Access Document
- Files:
-
-
(Preview, Accepted manuscript, pdf, 25.0MB, Terms of use)
-
Authors
- Publisher:
- IEEE
- Acceptance date:
- 2026-04-24
- Event title:
- IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2026)
- Event location:
- Denver, Colorado, USA
- Event website:
- https://cvpr.thecvf.com/Conferences/2026
- Event start date:
- 2026-06-03
- Event end date:
- 2026-06-07
- Language:
-
English
- Pubs id:
-
2433608
- Local pid:
-
pubs:2433608
- Deposit date:
-
2026-06-15
- ARK identifier:
Terms of use
- Copyright date:
- 2026
- Rights statement:
- This article is protected by copyright. All rights reserved.
- Notes:
- The author accepted manuscript (AAM) of this paper has been made available under the University of Oxford's Open Access Publications Policy, and a CC BY public copyright licence has been applied.
- Licence:
- CC Attribution (CC BY)
If you are the owner of this record, you can report an update to it here: Report update to this record