A large-scale systematic survey of SARS-CoV-2 antibodies reveals recurring molecular features

Abstract

In the past two years, the global research in combating COVID-19 pandemic has led to isolation and characterization of numerous human antibodies to the SARS-CoV-2 spike. This enormous collection of antibodies provides an unprecedented opportunity to study the antibody response to a single antigen. From mining information derived from 88 research publications and 13 patents, we have assembled a dataset of ∼8,000 human antibodies to the SARS-CoV-2 spike from >200 donors. Analysis of antibody targeting of different domains of the spike protein reveals a number of common (public) responses to SARS-CoV-2, exemplified via recurring IGHV/IGK(L)V pairs, CDR H3 sequences, IGHD usage, and somatic hypermutation. We further present a proof-of-concept for prediction of antigen specificity using deep learning to differentiate sequences of antibodies to SARS-CoV-2 spike and to influenza hemagglutinin. Overall, this study not only provides an informative resource for antibody and vaccine research, but fundamentally advances our molecular understanding of public antibody responses to a viral pathogen.

SciScore for 10.1101/2021.11.26.470157: (What is this?)

Please note, not all rigor criteria are appropriate for all manuscripts.

Table 1: Rigor

NIH rigor criteria are not applicable to paper type.

Table 2: Resources

Antibodies
Sentences	Resources
As of September 2021, there were 2,582 human SARS-CoV-2 antibodies in CoV-AbDab.	SARS-CoV-2 suggested: None
Training detail: SARS-CoV-2 S antibodies and influenza HA antibodies with complete information for all six CDR sequences were identified.	influenza HA suggested: None
The following hyper-parameters were used for model training: Using the same training set, validation set and test set, the model performance of using the following inputs was compared: Performance Metrics: For evaluating model performance, S antibodies and HA antibodies were considered “positive” and “negative”, respectively.	HA suggested: …

SciScore for 10.1101/2021.11.26.470157: (What is this?)

Please note, not all rigor criteria are appropriate for all manuscripts.

Table 1: Rigor

NIH rigor criteria are not applicable to paper type.

Table 2: Resources

Antibodies
Sentences	Resources
As of September 2021, there were 2,582 human SARS-CoV-2 antibodies in CoV-AbDab.	SARS-CoV-2 suggested: None
Training detail: SARS-CoV-2 S antibodies and influenza HA antibodies with complete information for all six CDR sequences were identified.	influenza HA suggested: None
The following hyper-parameters were used for model training: Using the same training set, validation set and test set, the model performance of using the following inputs was compared: Performance Metrics: For evaluating model performance, S antibodies and HA antibodies were considered “positive” and “negative”, respectively.	HA suggested: None
Software and Algorithms
Sentences	Resources
Putative germline genes were identified by IgBLAST [66].	IgBLAST suggested: (IgBLAST, RRID:SCR_002873)
Sequence logos were generated by Logomaker in Python [68].	Python suggested: (IPython, RRID:SCR_001658)
Sequences of each antibody were from the original papers (Data S2) or NCBI GenBank database (www.ncbi.nlm.nih.gov/genbank) [52].	NCBI GenBank suggested: (NCBI GenBank via FTP, RRID:SCR_010535)
Area under the curves of ROC (i.e. ROC AUC) and PR (i.e. PR AUC) were computed using the “keras.metrics” module in TensorFlow [73].	TensorFlow suggested: (tensorflow, RRID:SCR_016345)

Results from OddPub: Thank you for sharing your code and data.

Results from LimitationRecognizer: We detected the following sentences addressing limitations in the study:

Given the diverse types of public antibody responses to SARS-CoV-2 S, we need to acknowledge the limitation of using the conventional strict definition of public clonotype to study public antibody responses. Public antibody response to different antigens can have very different sequence features. For example, IGHV6-1 and IGHD3-9 are signatures of public antibody response to influenza virus [24, 60-62], whereas IGHV3-23 is frequently used in antibodies to Dengue and Zika viruses [63]. In contrast, these germline genes are seldom used in the antibody response to SARS-CoV-2 as compared to the naïve baseline (Figure 1B-C and Figure 3A). Since the binding specificity of an antibody is determined by its structure, which in turn is determined by its amino acid sequence, the antigen specificity of an antibody can theoretically be identified based on its sequence. This study provides a proof-of-concept by training a deep learning model to distinguish between SARS-CoV-2 S antibodies and influenza HA antibodies, solely based on primary sequence information. Technological advancements, such as the development of single-cell high-throughput screen using the Berkeley Lights Beacon optofluidics device [64] and advances in paired B-cell receptor sequencing [65], have been accelerating the speed of antibody discovery and characterization. As more sequence information on antibodies to different antigens is accumulated, we may be able in the future to construct a generalized sequence-based mode...

Results from TrialIdentifier: No clinical trial numbers were referenced.

Results from Barzooka: We did not find any issues relating to the usage of bar graphs.

Results from JetFighter: We did not find any issues relating to colormaps.

Results from rtransparent:

Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
Thank you for including a funding statement. Authors are encouraged to include this statement when submitting to a journal.
No protocol registration statement was detected.

Results from scite Reference Check: We found no unreliable references.

Read the original source

A large-scale systematic survey of SARS-CoV-2 antibodies reveals recurring molecular features

This article has been Reviewed by the following groups

Discuss this preprint

Listed in

Abstract

Article activity feed

Emergence of novel SARS-CoV-2 variants keeps slowing down

Fusion protein pan-sarbecovirus vaccines elicit broadly protective immune responses targeting Clade 1a, 1b, and 3 sarbecoviruses

Persistent Immune Dysregulation during Long COVID is Manifested in Antibodies Targeting Envelope and Nucleocapsid Proteins

This article has been Reviewed by the following groups

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Emergence of novel SARS-CoV-2 variants keeps slowing down

Fusion protein pan-sarbecovirus vaccines elicit broadly protective immune responses targeting Clade 1a, 1b, and 3 sarbecoviruses

Persistent Immune Dysregulation during Long COVID is Manifested in Antibodies Targeting Envelope and Nucleocapsid Proteins