“Tails Tell Tales: Chapter-Wide Manga Transcriptions With Character Names”, Ragav Sachdeva, Gyungin Shin, Andrew Zisserman2024-08-01 (, ; similar)⁠:

Enabling engagement of manga by visually impaired individuals presents a challenge due to its inherently visual nature. With the goal of fostering accessibility, this paper aims to generate a dialogue transcript of a complete manga chapter, entirely automatically, with a particular emphasis on ensuring narrative consistency. This entails identifying (1) what is being said, ie. detecting the texts on each page and classifying them into essential vs non-essential, and (2) who is saying it, ie. attributing each dialogue to its speaker, while ensuring the same characters are named consistently throughout the chapter.

To this end, we introduce: (1) Magiv2, a model that is capable of generating high-quality chapter-wide manga transcripts with named characters and higher precision in speaker diarization over prior works; (2) an extension of the PopManga evaluation dataset, which now includes annotations for speech-bubble tail boxes, associations of text to corresponding tails, classifications of text as essential or non-essential, and the identity for each character box; and (3) a new character bank dataset, which comprises over 11K characters from 76 manga series, featuring 11.5K exemplar character images in total, as well as a list of chapters in which they appear.

The code, trained model, and both datasets can be found at: https://github.com/ragavsachdeva/magi.