VCode: a multimodal coding benchmark with SVG as symbolic visual representation

Lin, KQ; Zheng, Y; Ran, H; Zhu, D; Mao, D; Li, L; Torr, P; Wang, AJ

AI Collection

Conference item

VCode: a multimodal coding benchmark with SVG as symbolic visual representation

Abstract:: Code has emerged as a precise, executable medium for linguistic-centric tasks, leaving visual-centric coding underexplored. Conventional image representations rely on RGB pixels that capture visual appearance but offer limited symbolic abstraction. In this work, we advocate SVG code as a compact, interpretable, and executable visual representation. We introduce VCode, a benchmark that reframes multimodal understanding as code generation: given an image, a model must produce SVG that preserves symbolic meaning for downstream reasoning. VCode covers general commonsense, professional disciplines, and visual-centric perception. To assess symbolic fidelity, we propose CodeVQA, a novel evaluation protocol where a policy model answers questions over rendered SVGs; correct answers indicate faithful symbolic preservation. We also introduce VCoder, an agentic framework that augments VLMs via test-time revision and visual tool use, yielding substantial improvements over strong baselines. The models are available at https://csu-jpg.github.io/VCode.

Publication status:: Accepted

Peer review status:: Peer reviewed

Actions

Email

Email this record

Send the bibliographic details of this record to your email address.

Your Email
Please enter the email address that the record information will be sent to.

-
Your message (optional)
Please add any additional information to be included within the email.
Share
Cite

Cite this record

APA Style

Lin, K. Q., Zheng, Y., Ran, H., Zhu, D., Mao, D., Li, L., Torr, P., & Wang, A. J. (2026). VCode: a multimodal coding benchmark with SVG as symbolic visual representation. IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2026).

MLA Style

Lin, KQ, et al. “VCode: a Multimodal Coding Benchmark with SVG as Symbolic Visual Representation.” IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2026), 2026.

Chicago Style

Lin, KQ, Y Zheng, H Ran, D Zhu, D Mao, L Li, P Torr, and AJ Wang. 2026. “VCode: a Multimodal Coding Benchmark with SVG as Symbolic Visual Representation.” In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2026). IEEE.
Print

Access Document

Files:: Lin_et_al_2026_VCode_a_multimodal.pdf

(Preview, Accepted manuscript, pdf, 25.0MB, Terms of use)

Authors

+ Lin, KQ More by this author

Institution:: University of Oxford
Division:: MPLS
Department:: Engineering Science
Role:: Author

+ Zheng, Y More by this author

Role:: Author

+ Ran, H More by this author

Role:: Author

+ Zhu, D More by this author

Role:: Author

+ Mao, D More by this author

Role:: Author

More authors...

Publisher:: IEEE
Acceptance date:: 2026-04-24
Event title:: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2026)
Event location:: Denver, Colorado, USA
Event website:: https://cvpr.thecvf.com/Conferences/2026
Event start date:: 2026-06-03
Event end date:: 2026-06-07

Language:: English
Pubs id:: 2433608
Local pid:: pubs:2433608
Deposit date:: 2026-06-15
ARK identifier:: ark:/29072/ora_142648ff38f346358ab266befff9fb33

Terms of use

Notes:: The author accepted manuscript (AAM) of this paper has been made available under the University of Oxford's Open Access Publications Policy, and a CC BY public copyright licence has been applied.

Licence:: CC Attribution (CC BY)

Views and Downloads

About views and downloads

If you are the owner of this record, you can report an update to it here: Report update to this record

Conference item

VCode: a multimodal coding benchmark with SVG as symbolic visual representation

Actions

Access Document

Authors

Terms of use

Views and Downloads

Altmetrics

Dimensions

Conference item

VCode: a multimodal coding benchmark with SVG as symbolic visual representation

Actions

Access Document

Authors

Bibliographic Details

Item Description

Terms of use

Metrics

Views and Downloads

Altmetrics

Dimensions