Standardized Assessment of LLM English Proficiency

Shangchao Min
Shaonan Wang
Xinyu Gao
Hui Wang
Zhiling Jin
Chen Ling
Nai Ding

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Large language models (LLMs) are increasingly used in language learning and assessment, yet their English proficiency is seldom reported against interpretable proficiency standards. We introduce the China’s Standards of English Language Ability (CSE) framework to assess the proficiency levels and subskills of LLMs. The test is referred to as the CSEBench and comprises 624 expert-annotated multiple-choice items across CSE Levels 2–7. Each item is accompanied by metadata, including difficulty level and subskill labels covering vocabulary, syntax, phonology, and cohesion/discourse. Critically, the dataset includes test responses from 2,050 middle school and and sophomore college students who are learning English as a second language. We evaluate closed-source models, open-source baselines, and enhanced open-source variants incorporating additional supervision and external knowledge. Results show a clear proficiency divide: after mapping model scores to CSE levels, closed‑source models consistently reach CSE Level 6, whereas most open‑source baselines cluster around CSE Levels 3–4. A follow‑up cognitive diagnostic analysis reveals that while closed‑source LLMs exhibit broad competence across subskills, open‑source models display persistent deficits—most pronounced in phonology. Crucially, these weaknesses are shown to be substantially reducible through targeted enhancements. CSEBench thus offers a proficiency-interpretable testbed for reporting LLM English ability and diagnosing subskill gaps.

Version published to 10.21203/rs.3.rs-8820245/v1 on Research Square
Feb 19, 2026

Generative AI and Oral Proficiency: How ChatGPT is Transforming English Language Speaking Practice

This article has 1 author:
1. Mohammad Mousazadeh
This article has no evaluationsLatest version Feb 16, 2026
A data-driven approach to the issue of ”catching up with monolinguals”

This article has 6 authors:
1. Cecile De Cat
2. Ludovica Serratrice
3. Arief Gusnanto
4. Philippe Prevost
5. Laurie Tuller
6. Sharon Unsworth
This article has no evaluationsLatest version Feb 11, 2026
Exploring variability in code- and meaning-focused skills in English and Spanish: A latent profile analysis of bilingual children

This article has 6 authors:
1. Amy S. Pratt
2. Ashley Sanabria
3. Kevin Grimm
4. John Gallagher
5. Reinaldo Cabrera Pérez
6. Elizabeth D. Peña
This article has no evaluationsLatest version Jan 22, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Generative AI and Oral Proficiency: How ChatGPT is Transforming English Language Speaking Practice

A data-driven approach to the issue of ”catching up with monolinguals”

Exploring variability in code- and meaning-focused skills in English and Spanish: A latent profile analysis of bilingual children