deepNGS Navigator: Exploring antibody NGS datasets using deep contrastive learning

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

High-throughput sequencing uncovers how B-cells adapt and evolve in response to antigens by generating B-cell receptor (BCR) sequences at an unprecedented scale. As BCR datasets grow to be millions of sequences, using efficient computational methods becomes crucial for analyzing and understanding complex patterns within the data. One important aspect of antibody sequence analysis is detecting clonal families or clusters of related sequences, whether they come from immunization, synthetic libraries or even ML-generated datasets. Such analysis helps us to understand how sequences are evolutionarily connected, and how they might have been selected or evolved. Here we introduce deepNGS Navigator, a computational tool that leverages language models and contrastive learning to transform large datasets of antibody sequences into 2D representations. The resulting 2D maps offer an intuitive visualization of overall diversity of input datasets, which can be clustered based on the sequence distances and their densities across the map. Beyond grouping related sequences, the 2D maps can also point to evolutionary trajectories and capture mutational patterns among closely related sequences. By analyzing properties like charge, hydrophobicity, number of sequence neighbors, and read counts, the maps highlight which clusters are most promising for further investigation while also detecting anomalies or noisy sequences with higher risk. We demonstrate deepNGS Navigator’s utilities on various datasets, including: 1) a synthetic library from a yeast display targeting HER2, 2) a machine learning-generated dataset with a hierarchical tree structure, 3) NGS sequences from a llama immunized against COVID RBD, 4) human naive and memory B-cell sequences, and 5) an insilico dataset simulating B-cell clonal lineages.

Article activity feed