An Intelligent Bidirectional Transliteration System for Ancient Tamil-Brahmi and Modern Tamil using a Rule-Based Unicode Model

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Ancient Tamil-Brahmi inscriptions constitute a vital part of South Asia’s epigraphical heritage, yet the script remains largely inaccessible to non-specialists. The primary challenge arises from the lack of effective digital tools for bidirectional transliteration between Ancient Tamil-Brahmi and Modern Tamil, which have diverged over centuries of morphological evolution. This study proposes a rule-based computational framework that performs high-precision, bidirectional transliteration between Ancient Tamil-Brahmi and Modern Tamil using Unicode character mapping, consonant–vowel composition rules, and virama handling. The system operates directly on Unicode text and supports the full Brahmi block, including Old Tamil extensions, thereby ensuring lossless character-level conversion. Experimental evaluation using a parallel Brahmi–Tamil dataset demonstrates perfect performance, achieving a BLEU score of 1.0000, 100%-character accuracy, and a character error rate (CER) of 0.0 in both transliteration directions. The high precision of the system underscores its immediate applicability to digital archiving, epigraphic research, and broader cultural heritage preservation. Future work will focus on integrating this framework with Optical Character Recognition (OCR) technologies and extending support to additional regional Brahmi variants.

Article activity feed