Geographically‑Informed Multilingual Neural Machine Translation

Mikhail Zolotilin

Read the full article

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

This work introduces the approach of integrating geographic coordinates into a multilingual neural machine translation architecture, alongside special tokens (linguistic tags). The approach enables modeling of language continua and hypothetical language varieties through geospatial interpolation across supported languages. We fine-tuned a Transformer model on a custom dataset of 31 languages annotated with geographic vectors and three types of tags (family, group, script), enabling the model to condition translations on spatial and linguistic features. Our experiments demonstrate that geographic embeddings encourage more coherent language clustering in the model’s latent space, facilitating smoother interpolation between mother than two related languages (e.g., across the Germanic or Slavic continua). Additionally, the model exhibits capabilities, such as performing partial transliteration between scripts. However, given the amount of data and training used, the model's capabilities are insufficient for generating non-existent hypothetical language varieties under unusual conditions (such as Balkan Germanic).

Version published to 10.20944/preprints202505.0322.v1
May 6, 2025

Cross-Lingual Transfer with Typological Constraints: A Case Study in Low-Resource NLP

This article has 1 author:
1. Raul Mateo Jimenez
This article has no evaluationsLatest version Apr 21, 2025
Verified Language Processing with Hybrid Explainability

This article has 3 authors:
1. Oliver Robert Fox
2. Giacomo Bergami
3. Graham Morgan
This article has no evaluationsLatest version May 16, 2025
Language-specific embeddings of Old English with character-level processing

This article has 2 authors:
1. Javier Martín Arista
2. Darío Metola Rodríguez
This article has no evaluationsLatest version Apr 21, 2025

Listed in

Abstract

Article activity feed

Related articles

Cross-Lingual Transfer with Typological Constraints: A Case Study in Low-Resource NLP

Verified Language Processing with Hybrid Explainability

Language-specific embeddings of Old English with character-level processing