Conference item
Tails tell tales: chapter-wide manga transcriptions with character names
- Abstract:
-
Enabling engagement of manga by visually impaired individuals presents a significant challenge due to its inherently visual nature. With the goal of fostering accessibility, this paper aims to generate a dialogue transcript of a complete manga chapter, entirely automatically, with a particular emphasis on ensuring narrative consistency. This entails identifying (i) what is being said, i.e., detecting the texts on each page and classifying them into essential vs non-essential, and (ii) who is saying it, i.e., attributing each dialogue to its speaker, while ensuring the same characters are named consistently throughout the chapter.
To this end, we introduce: (i) Magiv2, a model that is capable of generating high-quality chapter-wide manga transcripts with named characters and significantly higher precision in speaker diarisation over prior works; (ii) an extension of the PopManga evaluation dataset, which now includes annotations for speech-bubble tail boxes, associations of text to corresponding tails, classifications of text as essential or non-essential, and the identity for each character box; and (iii) a new character bank dataset, which comprises over 11K characters from 76 manga series, featuring 11.5K exemplar character images in total, as well as a list of chapters in which they appear. The code, trained model, and both datasets can be found at: https://github.com/ragavsachdeva/magi
- Publication status:
- Published
- Peer review status:
- Peer reviewed
Actions
Access Document
- Files:
-
-
(Preview, Accepted manuscript, pdf, 5.2MB, Terms of use)
-
- Publisher copy:
- 10.1007/978-981-96-0908-6_4
Authors
- Funder identifier:
- https://ror.org/0439y7842
- Grant:
- EP/T028572/1
- Publisher:
- Springer
- Host title:
- Computer Vision – ACCV 2024
- Pages:
- 63-80
- Series:
- Lecture Notes in Computer Science
- Series number:
- 15474
- Publication date:
- 2024-12-07
- Acceptance date:
- 2024-09-20
- Event title:
- 17th Asian Conference on Computer Vision (ACCV 2024)
- Event location:
- Hanoi, Vietnam
- Event website:
- https://accv2024.org/
- Event start date:
- 2024-12-08
- Event end date:
- 2024-12-12
- DOI:
- EISSN:
-
1611-3349
- ISSN:
-
0302-9743
- EISBN:
- 9789819609086
- ISBN:
- 9789819609079
- Language:
-
English
- Pubs id:
-
2080997
- Local pid:
-
pubs:2080997
- Deposit date:
-
2025-01-28
- ARK identifier:
Terms of use
- Copyright holder:
- Sachdeva et al.
- Copyright date:
- 2025
- Rights statement:
- © 2025 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
- Notes:
- This is the accepted manuscript version of the article. The final version is available online from Springer at https://dx.doi.org/10.1007/978-981-96-0908-6_4
If you are the owner of this record, you can report an update to it here: Report update to this record