A global omics data sharing and analytics marketplace: Case study of a rapid data COVID-19 pandemic response platform

This article has been Reviewed by the following groups

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Abstract

Under public health emergencies, particularly an early epidemic, it is fundamental that genetic and other healthcare data is shared across borders in both a timely and accurate manner before the outbreak of a global pandemic. However, although the COVID-19 pandemic has created a tidal wave of data, most patient data is siloed, not easily accessible, and due to low sample size, largely not actionable. Based on the precision medicine platform Shivom, a novel and secure data sharing and data analytics marketplace, we developed a versatile pandemic preparedness platform that allows healthcare professionals to rapidly share and analyze genetic data. The platform solves several problems of the global medical and research community, such as siloed data, cross-border data sharing, lack of state-of-the-art analytic tools, GDPR-compliance, and ease-of-use. The platform serves as a central marketplace of ‘discoverability’. The platform combines patient genomic & omics data sets, a marketplace for AI & bioinformatics algorithms, new diagnostic tools, and data-sharing capabilities to advance virus epidemiology and biomarker discovery. The bioinformatics marketplace contains some preinstalled COVID-19 pipelines to analyze virus- and host genomes without the need for bioinformatics expertise. The platform will be the quickest way to rapidly gain insight into the association between virus-host interactions and COVID-19 in various populations which can have a significant impact on managing the current pandemic and potential future disease outbreaks.

Article activity feed

  1. SciScore for 10.1101/2020.09.28.20203257: (What is this?)

    Please note, not all rigor criteria are appropriate for all manuscripts.

    Table 1: Rigor

    NIH rigor criteria are not applicable to paper type.

    Table 2: Resources

    Software and Algorithms
    SentencesResources
    The purpose of these other genetic file formats is to reduce file size – from many VCFs to just three distinct files: a) .bed (Plink binary biallelic genotype table), that contains the genotype call at biallelic variants, b) .bim (Plink extended MAP file) which also includes the names of the alleles: (chromosome, SNP, cM, base-position, allele 1, allele 2); c) .fam (Plink sample information file), is the last of the files and contains all the details with regards to the individuals including, whether there are parents in the datasets, and the sex (male/female).
    Plink
    suggested: (PLINK, RRID:SCR_001757)
    CSV files can be used with most any spreadsheet program, such as Microsoft Excel or Google Spreadsheets.
    Microsoft Excel
    suggested: (Microsoft Excel, RRID:SCR_016137)
    Another key specification of Nextflow is its integration with software repositories (including GitHub and BitBucket) and its native support for various cloud systems which provides rapid computation and effective scaling.
    BitBucket
    suggested: (Bitbucket, RRID:SCR_000502)
    In addition to the data marketplace, this feature sets the platform apart from other cloud computing tools that use Nextflow or other workflow management tools such as Toil, Snakemake or Bpipe.
    Bpipe
    suggested: (Bpipe, RRID:SCR_003471)
    GWAS pipeline to study virus-host interactions: The disadvantage of many research consortia, including some COVID-19 consortia, is that they do not open the data to all consortium members to analyze the collected data individually, but only provide data analysis by a central coordinator.
    GWAS
    suggested: (caGWAS, RRID:SCR_009617)
    The current version of the platform comes with preinstalled COVID-19 specific pipelines covering assembly statistics, alignment statistics, virus variant calling, and metagenomics analysis, and other pipelines that can be used for analyzing Covid-19 patient data such as GWAS or MetaGWAS.
    MetaGWAS
    suggested: None
    The pipeline uses the Kraken program metagenomics classification37–39.
    Kraken
    suggested: (Kraken, RRID:SCR_005484)
    Using exact alignment of k-mers, the pipeline achieves classification significantly quicker to the fastest BLAST program.
    BLAST
    suggested: (BLASTX, RRID:SCR_001653)
    The unaligned, viral sequences are then taken for de novo assembly using the Spades program and evaluated using the Quast program.
    Spades
    suggested: (SPAdes, RRID:SCR_000131)
    Quast
    suggested: (QUAST, RRID:SCR_001228)
    The contigs are subjected to gene/ORF prediction and the resulting sequences are further annotated using PROKKA.
    PROKKA
    suggested: (Prokka, RRID:SCR_014732)
    The alignments are checked for duplicates and realigned using Picard.
    Picard
    suggested: (Picard, RRID:SCR_006525)
    The variants are annotated with snpEff.
    snpEff
    suggested: (SnpEff, RRID:SCR_005191)

    Results from OddPub: Thank you for sharing your data.


    Results from LimitationRecognizer: An explicit section about the limitations of the techniques employed in this study was not found. We encourage authors to address study limitations.

    Results from TrialIdentifier: No clinical trial numbers were referenced.


    Results from Barzooka: We did not find any issues relating to the usage of bar graphs.


    Results from JetFighter: We did not find any issues relating to colormaps.


    Results from rtransparent:
    • Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
    • Thank you for including a funding statement. Authors are encouraged to include this statement when submitting to a journal.
    • No protocol registration statement was detected.

    About SciScore

    SciScore is an automated tool that is designed to assist expert reviewers by finding and presenting formulaic information scattered throughout a paper in a standard, easy to digest format. SciScore checks for the presence and correctness of RRIDs (research resource identifiers), and for rigor criteria such as sex and investigator blinding. For details on the theoretical underpinning of rigor criteria and the tools shown here, including references cited, please follow this link.