A global omics data sharing and analytics marketplace: Case study of a rapid data COVID-19 pandemic response platform

Abstract

Under public health emergencies, particularly an early epidemic, it is fundamental that genetic and other healthcare data is shared across borders in both a timely and accurate manner before the outbreak of a global pandemic. However, although the COVID-19 pandemic has created a tidal wave of data, most patient data is siloed, not easily accessible, and due to low sample size, largely not actionable. Based on the precision medicine platform Shivom, a novel and secure data sharing and data analytics marketplace, we developed a versatile pandemic preparedness platform that allows healthcare professionals to rapidly share and analyze genetic data. The platform solves several problems of the global medical and research community, such as siloed data, cross-border data sharing, lack of state-of-the-art analytic tools, GDPR-compliance, and ease-of-use. The platform serves as a central marketplace of ‘discoverability’. The platform combines patient genomic & omics data sets, a marketplace for AI & bioinformatics algorithms, new diagnostic tools, and data-sharing capabilities to advance virus epidemiology and biomarker discovery. The bioinformatics marketplace contains some preinstalled COVID-19 pipelines to analyze virus- and host genomes without the need for bioinformatics expertise. The platform will be the quickest way to rapidly gain insight into the association between virus-host interactions and COVID-19 in various populations which can have a significant impact on managing the current pandemic and potential future disease outbreaks.

SciScore for 10.1101/2020.09.28.20203257: (What is this?)

Please note, not all rigor criteria are appropriate for all manuscripts.

Table 1: Rigor

NIH rigor criteria are not applicable to paper type.

Table 2: Resources

Software and Algorithms

Sentences

Resources

The purpose of these other genetic file formats is to reduce file size – from many VCFs to just three distinct files: a) .bed (Plink binary biallelic genotype table), that contains the genotype call at biallelic variants, b) .bim (Plink extended MAP file) which also includes the names of the alleles: (chromosome, SNP, cM, base-position, allele 1, allele 2); c) .fam (Plink sample information file), is the last of the files and contains all the details with regards to the individuals including, whether there are parents in the datasets, and the sex (male/female).

Plink

suggested: (PLINK, …

SciScore for 10.1101/2020.09.28.20203257: (What is this?)

Please note, not all rigor criteria are appropriate for all manuscripts.

Table 1: Rigor

NIH rigor criteria are not applicable to paper type.

Table 2: Resources

Software and Algorithms
Sentences	Resources
The purpose of these other genetic file formats is to reduce file size – from many VCFs to just three distinct files: a) .bed (Plink binary biallelic genotype table), that contains the genotype call at biallelic variants, b) .bim (Plink extended MAP file) which also includes the names of the alleles: (chromosome, SNP, cM, base-position, allele 1, allele 2); c) .fam (Plink sample information file), is the last of the files and contains all the details with regards to the individuals including, whether there are parents in the datasets, and the sex (male/female).	Plink suggested: (PLINK, RRID:SCR_001757)
CSV files can be used with most any spreadsheet program, such as Microsoft Excel or Google Spreadsheets.	Microsoft Excel suggested: (Microsoft Excel, RRID:SCR_016137)
Another key specification of Nextflow is its integration with software repositories (including GitHub and BitBucket) and its native support for various cloud systems which provides rapid computation and effective scaling.	BitBucket suggested: (Bitbucket, RRID:SCR_000502)
In addition to the data marketplace, this feature sets the platform apart from other cloud computing tools that use Nextflow or other workflow management tools such as Toil, Snakemake or Bpipe.	Bpipe suggested: (Bpipe, RRID:SCR_003471)
GWAS pipeline to study virus-host interactions: The disadvantage of many research consortia, including some COVID-19 consortia, is that they do not open the data to all consortium members to analyze the collected data individually, but only provide data analysis by a central coordinator.	GWAS suggested: (caGWAS, RRID:SCR_009617)
The current version of the platform comes with preinstalled COVID-19 specific pipelines covering assembly statistics, alignment statistics, virus variant calling, and metagenomics analysis, and other pipelines that can be used for analyzing Covid-19 patient data such as GWAS or MetaGWAS.	MetaGWAS suggested: None
The pipeline uses the Kraken program metagenomics classification37–39.	Kraken suggested: (Kraken, RRID:SCR_005484)
Using exact alignment of k-mers, the pipeline achieves classification significantly quicker to the fastest BLAST program.	BLAST suggested: (BLASTX, RRID:SCR_001653)
The unaligned, viral sequences are then taken for de novo assembly using the Spades program and evaluated using the Quast program.	Spades suggested: (SPAdes, RRID:SCR_000131) Quast suggested: (QUAST, RRID:SCR_001228)
The contigs are subjected to gene/ORF prediction and the resulting sequences are further annotated using PROKKA.	PROKKA suggested: (Prokka, RRID:SCR_014732)
The alignments are checked for duplicates and realigned using Picard.	Picard suggested: (Picard, RRID:SCR_006525)
The variants are annotated with snpEff.	snpEff suggested: (SnpEff, RRID:SCR_005191)

Results from OddPub: Thank you for sharing your data.

Results from LimitationRecognizer: An explicit section about the limitations of the techniques employed in this study was not found. We encourage authors to address study limitations.

Results from TrialIdentifier: No clinical trial numbers were referenced.

Results from Barzooka: We did not find any issues relating to the usage of bar graphs.

Results from JetFighter: We did not find any issues relating to colormaps.

Results from rtransparent:

Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
Thank you for including a funding statement. Authors are encouraged to include this statement when submitting to a journal.
No protocol registration statement was detected.

Read the original source

A global omics data sharing and analytics marketplace: Case study of a rapid data COVID-19 pandemic response platform

This article has been Reviewed by the following groups

Discuss this preprint

Listed in

Abstract

Article activity feed

Data flow in the Luxembourg COVID-19 joint initiative: a blueprint for data-driven translational medicine

Data flow in the Luxembourg COVID-19 joint initiative: a blueprint for data-driven translational medicine

Evidence for health humanities: A study from the COVID-19 pandemic

This article has been Reviewed by the following groups

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Data flow in the Luxembourg COVID-19 joint initiative: a blueprint for data-driven translational medicine

Data flow in the Luxembourg COVID-19 joint initiative: a blueprint for data-driven translational medicine

Evidence for health humanities: A study from the COVID-19 pandemic