Measuring the Information Density of Interlanguage: An Entropy Analysis

Mohamed Mekheimer

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Interlanguage development is often assessed through structural counts that only partially capture how learner language is organized probabilistically. This study proposes a multi-level framework for measuring interlanguage information density using entropy-based metrics. A corpus of 150 L2 English argumentative essays from B1, B2, and C1 learners was compared with a genre-matched native-speaker corpus of 50 essays. Four indicators were examined: lexical entropy (Hₗₑₓ), grammatical divergence from a native reference distribution via POS trigrams (KL₍gram₎), compression ratio (CR), and positional concentration index (PCI). To model native variability more defensibly, KL₍gram₎ for each L1 text was calculated against a leave-one-out L1 reference distribution. Results showed a clear developmental gradient: lexical entropy and positional concentration increased with proficiency, whereas grammatical divergence and compression ratio decreased. Mixed-effects models confirmed that these shifts were robust effects of proficiency. The findings support a probabilistic view of interlanguage development and offer a principled diagnostic framework for evaluating communicative efficiency in L2 writing.

Version published to 10.21203/rs.3.rs-9295874/v1 on Research Square
Apr 16, 2026

The interaction of meaning similarity and confusability explains regularity in form-meaning mappings at and below the word level

This article has 4 authors:
1. Thomas Brochhagen
2. XIXIAN LIAO
3. Jamie Wright
4. Carmen Saldana
This article has no evaluationsLatest version Apr 19, 2026
Age-related differences in speech production: Evidence from graph analyses

This article has 4 authors:
1. Kathryn Walters
2. Abigail Cosgrove
3. Janaina Weissheimer
4. Michele T Diaz
This article has no evaluationsLatest version Apr 4, 2026
Operationalizing shared phonetic space in bilingual speech: A quantitative proof of concept for the Revised Speech Learning Model

This article has 1 author:
1. Alexandre Menezes Barroso
This article has no evaluationsLatest version Apr 10, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

The interaction of meaning similarity and confusability explains regularity in form-meaning mappings at and below the word level

Age-related differences in speech production: Evidence from graph analyses

Operationalizing shared phonetic space in bilingual speech: A quantitative proof of concept for the Revised Speech Learning Model