MultiLingual Scene Text Detection via Group-Specific Models
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Multilingual scene text detection has received significant attention in recent years. The challenge of this task is to design text detectors capable of handling a wide range of variability, such as font size, font style, color, complex background, and the presence of multilingual text in the same scene. Unlike current approaches that rely on a single model to detect text from all languages, we propose a group-specific modeling strategy for multilingual text detection. Our method clusters languages with similar visual structural characteristics, trains dedicated detectors for each group, and then fuses their results through a general object detector. In order to evaluate and compare the proposed methodology against state-of-the-art methods, the MLT-2019 dataset was used. The experiments demonstrated the effectiveness of our approach, outperforming the general single-model approach used widely in the literature by at least 6.06 percentage points when evaluated per language. Additionally, our approach surpassed the state-of-the-art method in terms of F1-macro on MLT-2019 and is the best-performing method in four out of seven languages: Arabic, Bangla, Hindi, and Chinese.