Following the Trail of One Million Genomes: Footprints of SARS-CoV-2 Adaptation to Humans

Saymon Akther
Edgaras Bezrucenkovas
Li Li
Brian Sulkow
Lia Di
Desiree Pante
Che L. Martin
Benjamin J. Luft
Weigang Qiu

This article has been Reviewed by the following groups

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

Evaluated articles (ScreenIT)

Abstract

Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) has accumulated genomic mutations at an approximately linear rate since it first infected human populations in late 2019. Controversies remain regarding the identity, proportion, and effects of adaptive mutations as SARS-CoV-2 evolves from a bat-to a human-adapted virus. The potential for vaccine-escape mutations poses additional challenges in pandemic control. Despite being of great interest to therapeutic and vaccine development, human-adaptive mutations in SARS-CoV-2 are masked by a genome-wide linkage disequilibrium under which neutral and even deleterious mutations can reach fixation by chance or through hitchhiking. Furthermore, genome-wide linkage equilibrium imposes clonal interference by which multiple adaptive mutations compete against one another. Informed by insights from microbial experimental evolution, we analyzed close to one million SARS-CoV-2 genomes sequenced during the first year of the COVID-19 pandemic and identified putative human-adaptive mutations according to the rates of synonymous and missense mutations, temporal linkage, and mutation recurrence. Furthermore, we developed a forward-evolution simulator with the realistic SARS-CoV-2 genome structure and base substitution probabilities able to predict viral genome diversity under neutral, background selection, and adaptive evolutionary models. We conclude that adaptive mutations have emerged early, rapidly, and constantly to dominate SARS-CoV-2 populations despite clonal interference and purifying selection. Our analysis underscores a need for genomic surveillance of mutation trajectories at the local level for early detection of adaptive and immune-escape variants. Putative human-adaptive mutations are over-represented in viral proteins interfering host immunity and binding host-cell receptors and thus may serve as priority targets for designing therapeutics and vaccines against human-adapted forms of SARS-CoV-2.

SciScore for 10.1101/2021.05.07.443114: (What is this?)

Please note, not all rigor criteria are appropriate for all manuscripts.

Table 1: Rigor

NIH rigor criteria are not applicable to paper type.

Table 2: Resources

Software and Algorithms
Sentences	Resources
Unique haplotypes were obtained with custom Perl scripts based on the BioPerl package (Stajich et al. 2002).	BioPerl suggested: (BioPerl, RRID:SCR_002989)
A custom Python script sampled viral genomes (e.g., n=100) by month and at three spatial scales (continent, country, and state).	Python suggested: (IPython, RRID:SCR_001658)
Evolutionary statistics, including variant frequencies, linkage disequilibrium (r2), haplotypes, and base substitution frequencies were generated with programs BCFTools and VCFTools (Danecek et al. 2011).	VCFTools suggested: (VCFtools, RRID:SCR_001235)
We used Haploview …

SciScore for 10.1101/2021.05.07.443114: (What is this?)

Please note, not all rigor criteria are appropriate for all manuscripts.

Table 1: Rigor

NIH rigor criteria are not applicable to paper type.

Table 2: Resources

Software and Algorithms
Sentences	Resources
Unique haplotypes were obtained with custom Perl scripts based on the BioPerl package (Stajich et al. 2002).	BioPerl suggested: (BioPerl, RRID:SCR_002989)
A custom Python script sampled viral genomes (e.g., n=100) by month and at three spatial scales (continent, country, and state).	Python suggested: (IPython, RRID:SCR_001658)
Evolutionary statistics, including variant frequencies, linkage disequilibrium (r2), haplotypes, and base substitution frequencies were generated with programs BCFTools and VCFTools (Danecek et al. 2011).	VCFTools suggested: (VCFtools, RRID:SCR_001235)
We used Haploview (version 4.2) to calculate LD scores (D’ and r2) as well as their statistical significance between pairs of SNVs (Barrett 2009).	Haploview suggested: (Haploview, RRID:SCR_003076)
We used the DNAPARS program of the PHYLIP (version 3.696) package to search for a maximum parsimony tree of unique haplotypes, obtaining the homoplasy index (HI) and the number of base substitutions at each SNV site (Felsenstein 1989).	PHYLIP suggested: (PHYLIP, RRID:SCR_006244)
To ensure that all genomic sites were mutated at least once, we ran CovSimulator ten times such that the chance of a site not undergoing any mutation was small p = 0.51210 = 1.25e-3.	CovSimulator suggested: None
The R package pheatmap was used to generate heatmaps.	pheatmap suggested: (pheatmap, RRID:SCR_016418)

Results from OddPub: Thank you for sharing your code and data.

Results from LimitationRecognizer: An explicit section about the limitations of the techniques employed in this study was not found. We encourage authors to address study limitations.

Results from TrialIdentifier: No clinical trial numbers were referenced.

Results from Barzooka: We did not find any issues relating to the usage of bar graphs.

Results from JetFighter: We did not find any issues relating to colormaps.

Results from rtransparent:

Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
Thank you for including a funding statement. Authors are encouraged to include this statement when submitting to a journal.
No protocol registration statement was detected.

Results from scite Reference Check: We found no unreliable references.

Read the original source

Version published to 10.1101/2021.05.07.443114 on bioRxiv
May 10, 2021

Emergence and Evolution of Triple Reassortant Highly Pathogenic Avian Influenza A(H5N1) Virus, Argentina, 2025

This article has 15 authors:
1. Estefania Benedetti
2. Maria Carolina Artuso
3. Alexander M. P. Byrne
4. Maria de Belen Garibotto
5. Martín Avaro
6. Luana Erica Piccini
7. Ariana Chamorro
8. Marcelo Sciorra
9. Vanina Daniela Marchione
10. Mara Laura Russo
11. Maria Elena Dattero
12. Erika Macias Machicado
13. Monica Galiano
14. Nicola Lewis
15. Andrea Veronica Pontoriero
This article has no evaluationsLatest version Dec 10, 2025
Genomic characterization of SARS-CoV-2 variants circulating in the population of Bangui, Central African Republic (CAR) in 2022.

This article has 15 authors:
1. Pulchérie Pelembi
2. Philippe Colson
3. Alain Farra
4. Ornella Anne Sibiro-Demi
5. Christian Noël Malaka
6. Aurélia Kwasiborski
7. Véronique Hourdel
8. Gilles Landry Ngaya
9. Romaric Nzoumbou-Boko
10. Jean-Claude Manuguerra
11. Emmanuel Ryvalin Nakoune-Yandoko
12. Guy VERNET
13. Bernard La Scola
14. Valérie Caro
15. Alexandre Manirakiza
This article has no evaluationsLatest version Jan 19, 2026
Molecular and antigenic landscape of Influenza viruses circulating in Brazil during the 2025 season

This article has 21 authors:
1. Paola Resende
2. Fernando Motta
3. Katia Santos
4. Mirleide Santos
5. Elisa Pereira
6. Larissa Macedo
7. Aline Matos
8. Braulia Caetano
9. Luciana Appolinario
10. Laís Bento
11. Ana Isabel Silva
12. Thauane da Silva
13. Luana Barbagelata
14. Amanda Cruz
15. Nancy Bellei
16. Sonia Raboni
17. Working group Brazilian Laboratory Network for Respiratory V team
18. Miriam Teresinha Livorati
19. Walquiria Almeida
20. Marcelo Gomes
21. Marilda Siqueira
This article has no evaluationsLatest version Jan 6, 2026

This article has been Reviewed by the following groups

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Emergence and Evolution of Triple Reassortant Highly Pathogenic Avian Influenza A(H5N1) Virus, Argentina, 2025

Genomic characterization of SARS-CoV-2 variants circulating in the population of Bangui, Central African Republic (CAR) in 2022.

Molecular and antigenic landscape of Influenza viruses circulating in Brazil during the 2025 season