The Ottoman-Turkish Transliteration using Traditional NLP Techniques

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Ottoman-Turkish transliteration is a relatively new problem. To make a vast amount of historical documents, books, newspapers, and magazines accessible to a wider audience unfamiliar with the Ottoman script, it is necessary to transliterate the Ottoman script into the Latin-based Turkish script. This study employs traditional NLP techniques to develop a dictionary-based Ottoman-Turkish transliteration system. Using a dataset of 2403 sentences and 31K words, we achieved a Word Error Rate (WER) of 20.69% (raw), 6.31% (normalized) and a Character Error Rate (CER) of 6.46% (raw) 3.01% (normalized), resulting in a BLEU score of 51.90 (raw) 77.18 (normalized). The results show that the proposed system has a promising performance for Ottoman-Turkish transliteration.

Article activity feed