From panels to prose: generating literary narratives from comics

Conference item

Abstract:: Comics have long been a popular form of storytelling, offering visually engaging narratives that captivate audiences worldwide. However, the visual nature of comics presents a significant barrier for visually impaired readers, limiting their access to these engaging stories. In this work, we provide a pragmatic solution to this accessibility challenge by developing an automated system that generates literary1 narratives from manga comics. Our approach aims to create an evocative and immersive prose that not only conveys the original narrative but also captures the depth and complexity of characters, their interactions, and the vivid settings in which they reside. To this end we make the following contributions: (1) We present a unified model, Magiv3, that excels at various functional tasks pertaining to comic understanding, such as localising panels, characters, texts, and speech-bubble tails, performing OCR, grounding characters etc. (2) We release human-annotated captions for over 3300 Japanese comic panels, along with character grounding annotations, and benchmark large vision-language models in their ability to understand comic images. (3) Finally, we demonstrate how integrating large vision-language models with Magiv3, can generate seamless literary narratives that allows visually impaired audiences to engage with the depth and richness of comic storytelling.

Files:: Sachdeva_and_Zisserman_2025_From_panels_to.pdf

(Preview, Accepted manuscript, pdf, 6.0MB, Terms of use)

Publisher:: IEEE
Host title:: 2025 IEEE/CVF International Conference on Computer Vision (ICCV)
Pages:: 21864-21873
Publication date:: 2026-04-29
Acceptance date:: 2025-07-23
Event title:: International Conference on Computer Vision (ICCV 2025)
Event location:: Honolulu, Hawai'i, USA
Event website:: https://www.robots.ox.ac.uk/~vgg/publications/2025/Sachdeva25/sachdeva25.pdf
Event start date:: 2025-10-19
Event end date:: 2025-10-23
DOI:: 10.1109/ICCV51701.2025.02030
EISSN:: 2380-7504
ISSN:: 1550-5499
EISBN:: 9788331587758
ISBN:: 9798331587765

Language:: English
Keywords:: feeds

antennas

protocols

HTTP

internet

instant messaging

radio access networks

regional area networks

videos

communication systems
Pubs id:: 2320746
Local pid:: pubs:2320746
Deposit date:: 2025-11-10
ARK identifier:: ark:/29072/ora_75b379effa694c3aa2e59efda9079654

Copyright holder:: IEEE
Notes:: This paper was presented at the International Conference on Computer Vision (ICCV 2025), 19th-23rd October 2025, Honolulu, Hawai'i, USA. The author accepted manuscript (AAM) of this paper has been made available under the University of Oxford's Open Access Publications Policy, and a CC BY public copyright licence has been applied.

If you are the owner of this record, you can report an update to it here: Report update to this record