MINT: A Multilingual Indic Neural Transformer for Abstractive Summarization Under Memory Constraints

Sameer Kumar Singh

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

We present MINT (Multilingual Indic Neural Transformer), a compact 14.7M parameter encoder-decoder architecture for abstractive summarization across seven Indic languages. MINT is designed specifically to operate within the memory envelope of a single commodity NVIDIA T4 GPU (15 GB VRAM), addressing the paradox in which models serving the most resource-constrained communities are themselves the most resource-intensive to deploy. The architecture incorporates Rotary Position Embeddings (RoPE), SiLU feed-forward activations, DropPath regularization, weight tying, and a custom 32,000-token SentencePiece Unigram tokenizer trained over balanced Indic corpora. Training proceeds in two phases on the XL-Sum BBC dataset across Hindi, Bengali, Marathi, Tamil, Telugu, Punjabi, and Urdu: a fluency phase (epochs 1-15) using linear warmup with cosine decay, followed by a refinement phase (epochs 16-25) with a flat low learning rate and a combined coverage-attention entropy loss that jointly penalizes repetition and hallucination. We conduct the first identical-regime comparison in Indic summarization, fine-tuning both IndicBART (440M parameters) and mT5-small (556M parameters) under the same loss function, optimizer, decoding strategy, and data pipeline as MINT’s refinement phase. On the XL-Sum test set, MINT achieves an average ROUGE-1 of 0.1187 at epoch 15, rising to 0.1302 on validation after full refinement, reaching approximately 84.8% of IndicBART’s ROUGE-1 (0.1409) on the six overlapping languages while using only 3.3% of its parameters. A critical methodological contribution of this work is the demonstration that the standard Google rouge_score library returns zero for all Indic scripts due to English centric tokenization; we implement and advocate for whitespace-based ROUGE evaluation as the correct approach. MINT additionally benefits from BERTScore-F1 of 0.8497 (via XLM-RoBERTa-Large) and LaBSE embedding cosine similarity of 0.4306, confirming that generated summaries carry semantic meaning even when surface overlap metrics are modest. All code and checkpoints are publicly released.

Version published to 10.20944/preprints202604.0940.v1
Apr 14, 2026

Attention Amplification in Multilingual LLMs: Why Script Representation Matters

This article has 3 authors:
1. Yash Mishra
2. Suyash Mishra
3. Kedarnath senapati
This article has no evaluationsLatest version Feb 25, 2026
Building Neural Machine Translation for Garo: Corpus, Ablation, and Agentic Reranking

This article has 2 authors:
1. Badal Nyalang
2. Walmatchi Momin
This article has no evaluationsLatest version Apr 7, 2026
Intrinsic Thematic Separability in Sentence-Transformer Embeddings: A Controlled Geometric Study from Synthetic to Real-World Text

This article has 1 author:
1. Miguel Pavón
This article has no evaluationsLatest version Mar 6, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Attention Amplification in Multilingual LLMs: Why Script Representation Matters

Building Neural Machine Translation for Garo: Corpus, Ablation, and Agentic Reranking

Intrinsic Thematic Separability in Sentence-Transformer Embeddings: A Controlled Geometric Study from Synthetic to Real-World Text