From Text to Sectors: Classifying 140 Years of Swiss Firm Registrations

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Disentangling structural economic shifts from persistent factors requires firm-level data of sufficient granularity and historical depth. In this study, we address this challenge by employing large language models (LLMs) to systematically classify business purpose descriptions from the multilingual Swiss Commercial Registry into standardized sectoral categories. Drawing on historical data spanning over 140 years, we classify more than two million firm registrations, providing granular coverage of the entire Swiss economy. We report three principal findings. First, zero-shot LLMs exhibit strong classification performance across sectors and languages, and demonstrate temporal robustness in predictive accuracy. Second, we trace the economic transformation of Switzerland, consistent with broader European trends, but documented here at the unusually fine-grained level of the individual firm. Third, we identify persistent cultural differences in sectoral entrepreneurship preferences along the Swiss language border. Ultimately, this paper demonstrates that LLMs can unlock previously untapped administrative data, offering new perspectives for historical economic analysis.

Article activity feed