The Evolution of Written Chinese (1910-1945): A Computational Study of Newspapers
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
The period from 1910 to 1945 witnessed the rapid transformation of written Chinese, as it shifted from Classical Chinese, a rigid and archaic form used since the days of Confucius, to Modern vernacular Chinese, a more realistic representation of spoken Chinese, especially the Mandarin dialect. However, this rapid linguistic change is difficult to quantify, as it occurred over several decades and across a massive corpus of texts. The goal of this study is to measure linguistic change during this period by applying a weighted cosine similarity formula to analyze thousands of Chinese newspapers from this era to capture linguistic change. Using two Optical Character Recognition programs, I converted thousands of publications from seven newspapers into a machine-readable format and then measured the yearly similarity across them. The results reveal several key trends that demonstrate the non-linear progression of Chinese during this era. On average, Chinese newspapers became less similar to one another in the years following the adoption of vernacular Chinese, before converging again around the late 1930s. Furthermore, several of the newspapers were consistently more similar to one another, though it is unclear whether this was in fact due to dialectal similarities. These findings provide new insights into how Chinese evolved during this pivotal era and demonstrate how computational methods can enrich historical inquiry.