Conference item icon

Conference item

Tails tell tales: chapter-wide manga transcriptions with character names

Abstract:

Enabling engagement of manga by visually impaired individuals presents a significant challenge due to its inherently visual nature. With the goal of fostering accessibility, this paper aims to generate a dialogue transcript of a complete manga chapter, entirely automatically, with a particular emphasis on ensuring narrative consistency. This entails identifying (i) what is being said, i.e., detecting the texts on each page and classifying them into essential vs non-essential, and (ii) who is saying it, i.e., attributing each dialogue to its speaker, while ensuring the same characters are named consistently throughout the chapter.

To this end, we introduce: (i) Magiv2, a model that is capable of generating high-quality chapter-wide manga transcripts with named characters and significantly higher precision in speaker diarisation over prior works; (ii) an extension of the PopManga evaluation dataset, which now includes annotations for speech-bubble tail boxes, associations of text to corresponding tails, classifications of text as essential or non-essential, and the identity for each character box; and (iii) a new character bank dataset, which comprises over 11K characters from 76 manga series, featuring 11.5K exemplar character images in total, as well as a list of chapters in which they appear. The code, trained model, and both datasets can be found at: https://github.com/ragavsachdeva/magi

Publication status:
Published
Peer review status:
Peer reviewed

Actions

Access Document

Files:
Publisher copy:
10.1007/978-981-96-0908-6_4

Authors

More by this author
Institution:
University of Oxford
Division:
MPLS
Department:
Engineering Science
Role:
Author
More by this author
Institution:
University of Oxford
Division:
MPLS
Department:
Engineering Science
Role:
Author
More by this author
Institution:
University of Oxford
Division:
MPLS
Department:
Engineering Science
Oxford college:
Brasenose College
Role:
Author
ORCID:
0000-0002-8945-8573


More from this funder
Funder identifier:
https://ror.org/0439y7842
Grant:
EP/T028572/1


Publisher:
Springer
Host title:
Computer Vision – ACCV 2024
Pages:
63-80
Series:
Lecture Notes in Computer Science
Series number:
15474
Publication date:
2024-12-07
Acceptance date:
2024-09-20
Event title:
17th Asian Conference on Computer Vision (ACCV 2024)
Event location:
Hanoi, Vietnam
Event website:
https://accv2024.org/
Event start date:
2024-12-08
Event end date:
2024-12-12
DOI:
EISSN:
1611-3349
ISSN:
0302-9743
EISBN:
9789819609086
ISBN:
9789819609079


Language:
English
Pubs id:
2080997
Local pid:
pubs:2080997
Deposit date:
2025-01-28
ARK identifier:

Terms of use


Views and Downloads






If you are the owner of this record, you can report an update to it here: Report update to this record

TO TOP