Contrastive learning of adverse events to provide effective and interpretable vector representations for machine-assisted pharmacovigilance

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Post-marketing surveillance is crucial for drug safety, yet the tools of pharmacovigilance rely solely on text-based data that may limit the applicability of contemporary machine learning methodologies in the support of decision making. Here, we adapt contrastive learning algorithms to generate adverse event vector representations from spontaneous reports to serve as general machine-readable resources for pharmacovigilance applications. We present comprehensive analyses of the resulting representations through density-based clustering, semantic evaluation and comparison of multivariate dispersions, revealing patterns that reflect both functional and causal relations of the adverse events while also capturing drug-safety related information better than existing medical terminologies and encoder-only large language models (LLMs). Furthermore, we demonstrate the applicability of the representations as input features in our downstream model, outperforming the reporting odds ratio method commonly used by regulatory agencies (AUROC: 0.88 vs 0.75) and LLM-based representations (AUROC: 0.88 vs 0.83) on drug–event causality prediction benchmarks. As such, this is the first demonstration of an interpretable adverse event vector representation that can be utilized for training arbitrary models, enabling wider and more effective applications of machine learning in pharmacovigilance.

Article activity feed