NE-OCR: Unified Optical Character Recognition for 10 Languages of Northeast India

Badal Nyalang

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

We present NE-OCR, a unified optical character recognition model for 10 Northeast Indian languages - represented across 12 language-script pairs spanning 4 scripts - along with Hindi and English as anchor languages. NE-OCR is built on a Vision Transformer backbone (ViTSTR-Base, 86M parameters), trained on approximately 1.34 million text-image pairs constructed from native language corpora. On a held-out benchmark of 24,000 test samples (2,000 per language-script pair), NE-OCR achieves a mean Character Accuracy (ChA) of 94.99%, reaching a peak of 98.85% on Khasi, while maintaining an inference latency of 17.2ms per image on an A40 GPU - the fastest among all evaluated systems. We benchmark against four baseline systems: EasyOCR, Tesseract 5, TrOCR-large-printed, and Chandra. NE-OCR outperforms all baselines across 9 Northeast Indian language-script pairs, with competitive performance on the English and Hindi anchor languages. We additionally present a qualitative analysis of DeepSeek OCR 2 and Chandra as representatives of the vision-language model (VLM) paradigm, demonstrating that VLMs fail on unseen scripts by hallucinating document structure rather than producing recognition errors. Model weights are publicly available under CC-BY-4.0.

Version published to 10.21203/rs.3.rs-9167777/v1 on Research Square
Mar 20, 2026

Reg2Bangla: An End-to-End Regional Speech Standardization

This article has 7 authors:
1. Samiul Basir Bhuiyan
2. Md Sazzad Hossain Adib
3. Mohammed Aman Bhuiyan
4. Aritra Islam Saswato
5. Ahmed Faizul Haque Dhrubo
6. Mohammad Ashrafuzzaman Khan
7. Mohammad Abdul Qayum
This article has no evaluationsLatest version Mar 17, 2026
A Modified Vision Transformer for Kurdish Cursive RTL Handwritten Text Recognition

This article has 2 authors:
1. Faraedwn M. Salih
2. Abdulbasit K. Al-talabani
This article has no evaluationsLatest version Apr 6, 2026
Word-level Afan Oromo Sign Language Recognition Using Deep Learning Approach

This article has 2 authors:
1. Solomon Endalu
2. Kula Kakeba
This article has no evaluationsLatest version Mar 25, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Reg2Bangla: An End-to-End Regional Speech Standardization

A Modified Vision Transformer for Kurdish Cursive RTL Handwritten Text Recognition

Word-level Afan Oromo Sign Language Recognition Using Deep Learning Approach