Accommodating individual travel history, global mobility, and unsampled diversity in phylogeography: a SARS-CoV-2 case study

Philippe Lemey
Samuel Hong
Verity Hill
Guy Baele
Chiara Poletto
Vittoria Colizza
Áine O’Toole
John T. McCrone
Kristian G. Andersen
Michael Worobey
Martha I. Nelson
Andrew Rambaut
Marc A. Suchard

This article has been Reviewed by the following groups

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

Evaluated articles (ScreenIT)

Abstract

Spatiotemporal bias in genome sequence sampling can severely confound phylogeographic inference based on discrete trait ancestral reconstruction. This has impeded our ability to accurately track the emergence and spread of SARS-CoV-2, the virus responsible for the COVID-19 pandemic. Despite the availability of unprecedented numbers of SARS-CoV-2 genomes on a global scale, evolutionary reconstructions are hindered by the slow accumulation of sequence divergence over its relatively short transmission history. When confronted with these issues, incorporating additional contextual data may critically inform phylodynamic reconstructions. Here, we present a new approach to integrate individual travel history data in Bayesian phylogeographic inference and apply it to the early spread of SARS-CoV-2, while also including global air transportation data. We demonstrate that including travel history data for each SARS-CoV-2 genome yields more realistic reconstructions of virus spread, particularly when travelers from undersampled locations are included to mitigate sampling bias. We further explore methods to ameliorate the impact of sampling bias by augmenting the phylogeographic analysis with lineages from undersampled locations in the analyses. Our reconstructions reinforce specific transmission hypotheses suggested by the inclusion of travel history data, but also suggest alternative routes of virus migration that are plausible within the epidemiological context but are not apparent with current sampling efforts. Although further research is needed to fully examine the performance of our travel-aware phylogeographic analyses with unsampled diversity and to further improve them, they represent multiple new avenues for directly addressing the colossal issue of sample bias in phylogeographic inference.

SciScore for 10.1101/2020.06.22.165464: (What is this?)

Please note, not all rigor criteria are appropriate for all manuscripts.

Table 1: Rigor

NIH rigor criteria are not applicable to paper type.

Table 2: Resources

Software and Algorithms
Sentences	Resources
We aligned the remaining 286 genomes using MAFFT 21 and partially trimmed the 5’ and 3’ ends.	MAFFT suggested: (MAFFT, RRID:SCR_011811)
Upon visualizing root-to-tip divergence as a function of sampling time using TempEst 22 based on an ML tree inferred with IQ-TREE 23, we removed one potential outlier.	TempEst suggested: (TempEst, RRID:SCR_017304) IQ-TREE suggested: (IQ-TREE, RRID:SCR_017254)
We summarize posterior tree distributions using maximum clade credibility (MCC) trees and visualize them using FigTree.	FigTree suggested: (FigTree, RRID:SCR_008515)
A new BEAST tree sample tool …

SciScore for 10.1101/2020.06.22.165464: (What is this?)

Please note, not all rigor criteria are appropriate for all manuscripts.

Table 1: Rigor

NIH rigor criteria are not applicable to paper type.

Table 2: Resources

Software and Algorithms
Sentences	Resources
We aligned the remaining 286 genomes using MAFFT 21 and partially trimmed the 5’ and 3’ ends.	MAFFT suggested: (MAFFT, RRID:SCR_011811)
Upon visualizing root-to-tip divergence as a function of sampling time using TempEst 22 based on an ML tree inferred with IQ-TREE 23, we removed one potential outlier.	TempEst suggested: (TempEst, RRID:SCR_017304) IQ-TREE suggested: (IQ-TREE, RRID:SCR_017254)
We summarize posterior tree distributions using maximum clade credibility (MCC) trees and visualize them using FigTree.	FigTree suggested: (FigTree, RRID:SCR_008515)
A new BEAST tree sample tool (TaxaMarkovJumpHistoryAnalyzer available in the BEAST codebase at https://github.com/beast-dev/beast-mcmc) and associated R package constructs these estimates.	BEAST suggested: (BEAST, RRID:SCR_010228)

Results from OddPub: Thank you for sharing your code.

Results from LimitationRecognizer: An explicit section about the limitations of the techniques employed in this study was not found. We encourage authors to address study limitations.

Results from TrialIdentifier: No clinical trial numbers were referenced.

Results from Barzooka: We did not find any issues relating to the usage of bar graphs.

Results from JetFighter: We did not find any issues relating to colormaps.

Results from rtransparent:

Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
Thank you for including a funding statement. Authors are encouraged to include this statement when submitting to a journal.
No protocol registration statement was detected.

Read the original source

Version published to 10.1101/2020.06.22.165464 on bioRxiv
Jun 23, 2020

Phylogenetic Lineages of <a id="article-title"></a>PRRSV-2 from Canada Reveal Patterns of Transboundary Spread and Two Novel Sub-Lineages in North America

This article has 10 authors:
1. Joao P. H. da Silva
2. Igor A. D. Paploski
3. Robert Charette
4. Luc Dufresne
5. Sylvain Messier
6. Julie Bolduc
7. Mariana Kikuti
8. Nakarin Pamornchainavakul
9. Cesar A. Corzo
10. Kimberly VanderWaal
This article has no evaluationsLatest version Jan 9, 2026
Global geography of natural history museum specimen holdings: A time-resolved network analysis of mammal collections, 1900–2020

This article has 2 authors:
1. Mystyn Mills
2. Unna Lassiter
This article has no evaluationsLatest version Jan 22, 2026
Parallel adaptation and cryptic global expansion of Mycobacterium tuberculosis Lineage 3

This article has 5 authors:
1. Chendi Zhu
2. Zhaojun Wu
3. Mingxing Ni
4. Zhuofan Huang
5. Wei-Min Li
This article has no evaluationsLatest version Jan 16, 2026

This article has been Reviewed by the following groups

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Phylogenetic Lineages of <a id="article-title"></a>PRRSV-2 from Canada Reveal Patterns of Transboundary Spread and Two Novel Sub-Lineages in North America

Global geography of natural history museum specimen holdings: A time-resolved network analysis of mammal collections, 1900–2020

Parallel adaptation and cryptic global expansion of Mycobacterium tuberculosis Lineage 3