An interactive viral genome evolution network analysis system enabling rapid large-scale molecular tracing of SARS-CoV-2
This article has been Reviewed by the following groups
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
- Evaluated articles (ScreenIT)
Abstract
Comprehensive analyses of viral genomes can provide a global picture on SARS-CoV-2 transmission and help to predict the oncoming trends of pandemic. This molecular tracing is mainly conducted through extensive phylogenetic network analyses. However, the rapid accumulation of SARS-CoV-2 genomes presents an unprecedented data size and complexity that has exceeded the capacity of existing methods in constructing evolution network through virus genotyping. Here we report a Viral genome Evolution Network Analysis System (VENAS), which uses Hamming distances adjusted by the minor allele frequency to construct viral genome evolution network. The resulting network was topologically clustered and divided using community detection algorithm, and potential evolution paths were further inferred with a network disassortativity trimming algorithm. We also employed parallel computing technology to achieve rapid processing and interactive visualization of >10,000 viral genomes, enabling accurate detection and subtyping of the viral mutations through different stages of Covid-19 pandemic. In particular, several core viral mutations can be independently identified and linked to early transmission events in Covid-19 pandemic. As a general platform for comprehensive viral genome analysis, VENAS serves as a useful computational tool in the current and future pandemics.
Article activity feed
-
SciScore for 10.1101/2020.12.09.417121: (What is this?)
Please note, not all rigor criteria are appropriate for all manuscripts.
Table 1: Rigor
NIH rigor criteria are not applicable to paper type.Table 2: Resources
Software and Algorithms Sentences Resources First, all SARS-CoV-2 sequences should be multiple-aligned using MAFFT [23] to obtain a consensus alignment in ma format. MAFFTsuggested: (MAFFT, RRID:SCR_011811)Topological classification and major path recognition: Two third-party Python libraries, networkx (https://networkx.github.io/) and CDlib420(https://github.com/GiulioRossetti/CDlib), were used in the topological clustering and backbone network extracting process once the viral genome evolution network was constructed. Pythonsuggested: (IPython, RRID:SCR_001658)https://networkx.github.io/suggested: (NetworkX, RRID:SCR_016864)Results …
SciScore for 10.1101/2020.12.09.417121: (What is this?)
Please note, not all rigor criteria are appropriate for all manuscripts.
Table 1: Rigor
NIH rigor criteria are not applicable to paper type.Table 2: Resources
Software and Algorithms Sentences Resources First, all SARS-CoV-2 sequences should be multiple-aligned using MAFFT [23] to obtain a consensus alignment in ma format. MAFFTsuggested: (MAFFT, RRID:SCR_011811)Topological classification and major path recognition: Two third-party Python libraries, networkx (https://networkx.github.io/) and CDlib420(https://github.com/GiulioRossetti/CDlib), were used in the topological clustering and backbone network extracting process once the viral genome evolution network was constructed. Pythonsuggested: (IPython, RRID:SCR_001658)https://networkx.github.io/suggested: (NetworkX, RRID:SCR_016864)Results from OddPub: Thank you for sharing your code and data.
Results from LimitationRecognizer: We detected the following sentences addressing limitations in the study:Another caveat during network interpretation is that the “variant reversion” could be mistakenly called in some areas because the neighbor-joining method tend to force a fully connected network despite missing samples. In such case, a base at the certain locus may appear to be mutated back and forth on two adjacent or nearby edges. The insufficient sampling of some specific genome types can lead to a missing node that is required to construct a coherent path of nodes reflecting the real transmission events. In such situation, the algorithm will “enforce” the network construction through a neighbor node that is the closest to the real node. The enormous amount of mutational data accumulated during viral transmission was manifested in VENAS as large community-based clades through the modularity-based community detection algorithm such as Louvain. In each topological clade identified, the central node represents the predominant genome type in the given viral community, and the genome diversity of the viral community was reflected by the numbers of the inward and outward edges. We employed the network disassortativity trimming algorithm to merge the small clades at the edge of the network with the major clades, and the resulting backbone network of the major clades can effectively reflect the major evolutionary paths and associated core variations. In conclusion, the genomic tracking is critical to understand the global transmission of the SARS-CoV-2. We developed a software plat...
Results from TrialIdentifier: No clinical trial numbers were referenced.
Results from Barzooka: We did not find any issues relating to the usage of bar graphs.
Results from JetFighter: We did not find any issues relating to colormaps.
Results from rtransparent:- Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
- Thank you for including a funding statement. Authors are encouraged to include this statement when submitting to a journal.
- No protocol registration statement was detected.
-
