LLM-Based Measurement of Latent Attributes in Trade Data
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Trade data are available at a high level of disaggregation, allowing scholars to examine flows of highly specific goods. Yet the sheer number of goods classifications (5,000+) makes it difficult to analyze trade flows and tariff policy at a mid-level of aggregation beyond a few existing categorizations. Here, we outline a method that can scale---not merely classify---traded goods on researcher-defined dimensions that are orthogonal to existing classification schemes. We propose that the embedded knowledge in large language models (LLMs) can be used to conduct pairwise comparisons (PWCs) of Harmonized System (HS) product descriptions by determining their relative proximity to a specific concept. A Bayesian Bradley--Terry model then uses these PWCs to place individual items on a latent scale of interest. These estimates and their associated uncertainty can then be used for downstream descriptive or causal analysis.