Semantic Drift and Reporting Stability in VAERS: An Embedding-Based Analysis of Vaccine Safety Narratives
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
The Vaccine Adverse Event Reporting System (VAERS) is critical for post-market vaccine surveillance to understand the progression of rare but severe adverse events. Traditional analysis uses event frequencies and odds ratios to determine safety signals focusing on biological causality. Linguistic reports provide underused behavioral information that affect reporting dynamics. This study aims to quantify temporal and semantic drift in vaccine safety narratives and examine its relationship to epidemiological reporting patterns. The narrative text for Influenza, MMR and Pneumococcal vaccines were preprocessed and converted to numerical format using embedding models to quantify similarity between reports. The cosine similarity distance was used to calculate a distortion index, a composite measure of linguistic heterogeneity and volatility. The index was correlated with behavioral indicators as text length, reporting lag and distributional changes in reported symptoms and tested for robustness using bootstrap sampling. Additionally, semantic regimes were found using changepoint detection, showing that language drift can act as a measurable signal for the behavioral dynamics of pharmacovigilance.