A large-scale systematic survey of SARS-CoV-2 antibodies reveals recurring molecular features

This article has been Reviewed by the following groups

Read the full article See related articles

Abstract

In the past two years, the global research in combating COVID-19 pandemic has led to isolation and characterization of numerous human antibodies to the SARS-CoV-2 spike. This enormous collection of antibodies provides an unprecedented opportunity to study the antibody response to a single antigen. From mining information derived from 88 research publications and 13 patents, we have assembled a dataset of ∼8,000 human antibodies to the SARS-CoV-2 spike from >200 donors. Analysis of antibody targeting of different domains of the spike protein reveals a number of common (public) responses to SARS-CoV-2, exemplified via recurring IGHV/IGK(L)V pairs, CDR H3 sequences, IGHD usage, and somatic hypermutation. We further present a proof-of-concept for prediction of antigen specificity using deep learning to differentiate sequences of antibodies to SARS-CoV-2 spike and to influenza hemagglutinin. Overall, this study not only provides an informative resource for antibody and vaccine research, but fundamentally advances our molecular understanding of public antibody responses to a viral pathogen.

Article activity feed

  1. SciScore for 10.1101/2021.11.26.470157: (What is this?)

    Please note, not all rigor criteria are appropriate for all manuscripts.

    Table 1: Rigor

    NIH rigor criteria are not applicable to paper type.

    Table 2: Resources

    Antibodies
    SentencesResources
    As of September 2021, there were 2,582 human SARS-CoV-2 antibodies in CoV-AbDab.
    SARS-CoV-2
    suggested: None
    Training detail: SARS-CoV-2 S antibodies and influenza HA antibodies with complete information for all six CDR sequences were identified.
    influenza HA
    suggested: None
    The following hyper-parameters were used for model training: Using the same training set, validation set and test set, the model performance of using the following inputs was compared: Performance Metrics: For evaluating model performance, S antibodies and HA antibodies were considered “positive” and “negative”, respectively.
    HA
    suggested: None
    Software and Algorithms
    SentencesResources
    Putative germline genes were identified by IgBLAST [66].
    IgBLAST
    suggested: (IgBLAST, RRID:SCR_002873)
    Sequence logos were generated by Logomaker in Python [68].
    Python
    suggested: (IPython, RRID:SCR_001658)
    Sequences of each antibody were from the original papers (Data S2) or NCBI GenBank database (www.ncbi.nlm.nih.gov/genbank) [52].
    NCBI GenBank
    suggested: (NCBI GenBank via FTP, RRID:SCR_010535)
    Area under the curves of ROC (i.e. ROC AUC) and PR (i.e. PR AUC) were computed using the “keras.metrics” module in TensorFlow [73].
    TensorFlow
    suggested: (tensorflow, RRID:SCR_016345)

    Results from OddPub: Thank you for sharing your code and data.


    Results from LimitationRecognizer: We detected the following sentences addressing limitations in the study:
    Given the diverse types of public antibody responses to SARS-CoV-2 S, we need to acknowledge the limitation of using the conventional strict definition of public clonotype to study public antibody responses. Public antibody response to different antigens can have very different sequence features. For example, IGHV6-1 and IGHD3-9 are signatures of public antibody response to influenza virus [24, 60-62], whereas IGHV3-23 is frequently used in antibodies to Dengue and Zika viruses [63]. In contrast, these germline genes are seldom used in the antibody response to SARS-CoV-2 as compared to the naïve baseline (Figure 1B-C and Figure 3A). Since the binding specificity of an antibody is determined by its structure, which in turn is determined by its amino acid sequence, the antigen specificity of an antibody can theoretically be identified based on its sequence. This study provides a proof-of-concept by training a deep learning model to distinguish between SARS-CoV-2 S antibodies and influenza HA antibodies, solely based on primary sequence information. Technological advancements, such as the development of single-cell high-throughput screen using the Berkeley Lights Beacon optofluidics device [64] and advances in paired B-cell receptor sequencing [65], have been accelerating the speed of antibody discovery and characterization. As more sequence information on antibodies to different antigens is accumulated, we may be able in the future to construct a generalized sequence-based mode...

    Results from TrialIdentifier: No clinical trial numbers were referenced.


    Results from Barzooka: We did not find any issues relating to the usage of bar graphs.


    Results from JetFighter: We did not find any issues relating to colormaps.


    Results from rtransparent:
    • Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
    • Thank you for including a funding statement. Authors are encouraged to include this statement when submitting to a journal.
    • No protocol registration statement was detected.

    Results from scite Reference Check: We found no unreliable references.


    About SciScore

    SciScore is an automated tool that is designed to assist expert reviewers by finding and presenting formulaic information scattered throughout a paper in a standard, easy to digest format. SciScore checks for the presence and correctness of RRIDs (research resource identifiers), and for rigor criteria such as sex and investigator blinding. For details on the theoretical underpinning of rigor criteria and the tools shown here, including references cited, please follow this link.