Transmission of SARS-COV-2 from China to Europe and West Africa: a detailed phylogenetic analysis

This article has been Reviewed by the following groups

Read the full article

Abstract

Background

SARS-CoV-2, the virus causing the Covid-19 pandemic emerged in December 2019 in China and raised fears that it could overwhelm healthcare systems worldwide. In June 2020, all African countries registered human infections with SARS-CoV-2.

The virus is mutating steadily and this is monitored by a well curated database of viral nucleotide sequences from samples taken from infected individual thus enabling phylogenetic analysis and phenotypic associations.

Methods

We downloaded from the GISAID database, SARS-CoV-2 sequences established from four West African countries Ghana, Gambia, Senegal and Nigeria and then performed phylogenetic analysis employing the nextstrain pipeline. Based on mutations found within the sequences we calculated and visualized statistics characterizing clades according to the GISAID nomenclature.

Results

We found country-specific patterns of viral clades: the later Europe-associated G-clades predominantly in Senegal and Gambia, and combinations of the earlier (L, S, V) and later clades in Ghana and Nigeria. Contrary to our expectations, the later Europe-associated G-clades emerged before the earlier clades. Detailed analysis of distinct samples showed that some of the earlier clades might have circulated latently and some reflect migration routes via Mali and Tunisia.

Conclusions

The distinct patterns of viral clades in the West African countries point at its emergence from Europe and China via Asia and Europe. The observation that the later clades emerged before the earlier clades could be simply due to founder effects or due to latent circulation of the earlier clades. Only a marginal correlation of the G-clades associated with the D614G mutation could be identified with the relatively low case fatality (0.6-3.2).

Key messages

  • Ghana and Nigeria have a combination of earlier (L, V, S) and later Europe-associated G-clades of SARS-CoV-2, therefore pointing to multiple introductions while in Senegal and Gambia Europe-associated G-clades predominate pointing to introductions mainly from Europe.

  • Surprisingly, the later G-clades emerged before the earlier clades (L, V, S)

  • Detailed phylogenetic analysis points at latent circulation of earlier clades before the first registered cases.

  • Phylogenetic analysis of some cases points at migration routes to Europe via Tunisia, Egypt and Mali.

  • A marginal correlation of r=0.28 between the percentage of the D614G mutation defining the G-clades and case-fatality can be detected.

Article activity feed

  1. SciScore for 10.1101/2020.10.02.323519: (What is this?)

    Please note, not all rigor criteria are appropriate for all manuscripts.

    Table 1: Rigor

    NIH rigor criteria are not applicable to paper type.

    Table 2: Resources

    Software and Algorithms
    SentencesResources
    Construction of the phylogenetic tree: The phylogenetic tree was constructed using a pipeline adapted from the Zika virus pipeline on the nextstrain.org web page (12) employing the Augur (12), the MAFFT (13) and the IQ-tree (14) software.
    MAFFT
    suggested: (MAFFT, RRID:SCR_011811)
    IQ-tree
    suggested: (IQ-TREE, RRID:SCR_017254)
    Details of steps which were performed: First all West African and reference sequences in FASTA format were aligned employing the augur command: augur align --sequences westafrica.fasta --reference-sequence sars_cov2_referencesequence.gb --output wa_aligned.fasta --fill-gaps which called MAFFT (13) with the command: mafft --reorder --anysymbol --nomemsave --adjustdirection --thread 1 wa_aligned.fasta.to_align.fasta 1> wa_aligned.fasta 2> wa_aligned.fasta.log Metadata was extracted from the sequences FASTA via the augur command: augur parse --sequences wa_aligned.fasta --fields strain accession date --output-sequences wa_aligned_parsed.fasta --output-metadata metadata.tsv Then a tree was built via the augur command: augur tree --alignment wa_aligned_parsed.fasta --output wa_tree_raw.nwk Calling the IQ-tree algorithm (14) via this command: iqtree -ninit 2 -n 2 -me 0.05 -nt 1 -s wa_aligned_parsed-delim.fasta -m GTR > wa_aligned_parsed-delim.iqtree.log The tree was refined via the Augur software calling TreeTime (15) for Maximum-Likelihood analysis inferring a time resolved phylogeny tree: augur refine --tree wa_tree_raw.nwk --alignment wa_aligned_parsed.fasta --metadata metadata.tsv --output-tree wa_tree.nwk --output-node-data wa_branch_lengths.json --timetree --coalescent opt --date-confidence --date-inference marginal --clock-filter-iqd 4 --keep-polytomies Here, the command from the Zika pieline was adapted to --keep-polytomies to keep all samples.
    Augur
    suggested: None
    augur traits --tree wa_tree.nwk --metadata metadata_countries.tsv --output wa_traits.json --columns region country --confidence Augur was called to infer ancestral states of discrete character again using TreeTime (15): augur ancestral --tree wa_tree.nwk --alignment wa_aligned_parsed.fasta --output-node-data wa_nt_muts.json --inference joint Amino acid mutations were identified with the augur translate command: augur translate --tree wa_tree.nwk --ancestral-sequences wa_nt_muts.json --reference-sequence sars_cov2_referencesequence.gb --output wa_aa_muts.json Results were exported via the augur command: augur export v2 --tree wa_tree.nwk --metadata metadata_countries.tsv --node-data wa_branch_lengths.json wa_traits.json wa_nt_muts.json wa_aa_muts.json --colors colors.tsv --lat-longs lat_longs.tsv --auspice-config auspice_config.json --output wa_cov19.json Visualization of the phylogenetic tree and annotation with mutations and clades: The phylogenetic tree was annotated with crucial mutations using the tool FigTree version 1.4.4 (http://tree.bio.ed.ac.uk/software/figtree/).
    FigTree
    suggested: (FigTree, RRID:SCR_008515)

    Results from OddPub: We did not detect open data. We also did not detect open code. Researchers are encouraged to share open data when possible (see Nature blog).


    Results from LimitationRecognizer: We detected the following sentences addressing limitations in the study:
    The limitations of this study are the sample size, possible selection bias of the samples and the intrinsic incompleteness of the phylogenetic analysis which may lead to altered results when more samples are included. Nonetheless, this is the first study of its kind, the data and concept should form the basis for a more extensive analysis due to an increased number of sequenced samples. In conclusion, in this phylogenetic analysis of SARS-CoV-2, we found distinct patterns of viral clades: the later Europe-associated G-clades are predominant in Senegal and Gambia, and combinations of the earlier (L, S, V) and later clades in Ghana and Nigeria. Intriguingly, the later clades emerged before the earlier clades which could simply be due to founder effects or due to latent circulation of the earlier clades. Only a marginal correlation of the G-clades in the West African countries can be associated with mortality which fortunately is at a rather low level therefore disproving fears that the pandemic would massively overwhelm the health systems in Africa. The rather young population and the climate might be factors favoring this low fatality rate in comparison to Western countries but nevertheless a cautious balance between health protection and economics might prevent future disastrous outbreaks.

    Results from TrialIdentifier: No clinical trial numbers were referenced.


    Results from Barzooka: We did not find any issues relating to the usage of bar graphs.


    Results from JetFighter: We did not find any issues relating to colormaps.


    Results from rtransparent:
    • Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
    • No funding statement was detected.
    • No protocol registration statement was detected.

    About SciScore

    SciScore is an automated tool that is designed to assist expert reviewers by finding and presenting formulaic information scattered throughout a paper in a standard, easy to digest format. SciScore checks for the presence and correctness of RRIDs (research resource identifiers), and for rigor criteria such as sex and investigator blinding. For details on the theoretical underpinning of rigor criteria and the tools shown here, including references cited, please follow this link.