A global omics data sharing and analytics marketplace: Case study of a rapid data COVID-19 pandemic response platform
This article has been Reviewed by the following groups
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
- Evaluated articles (ScreenIT)
Abstract
Under public health emergencies, particularly an early epidemic, it is fundamental that genetic and other healthcare data is shared across borders in both a timely and accurate manner before the outbreak of a global pandemic. However, although the COVID-19 pandemic has created a tidal wave of data, most patient data is siloed, not easily accessible, and due to low sample size, largely not actionable. Based on the precision medicine platform Shivom, a novel and secure data sharing and data analytics marketplace, we developed a versatile pandemic preparedness platform that allows healthcare professionals to rapidly share and analyze genetic data. The platform solves several problems of the global medical and research community, such as siloed data, cross-border data sharing, lack of state-of-the-art analytic tools, GDPR-compliance, and ease-of-use. The platform serves as a central marketplace of ‘discoverability’. The platform combines patient genomic & omics data sets, a marketplace for AI & bioinformatics algorithms, new diagnostic tools, and data-sharing capabilities to advance virus epidemiology and biomarker discovery. The bioinformatics marketplace contains some preinstalled COVID-19 pipelines to analyze virus- and host genomes without the need for bioinformatics expertise. The platform will be the quickest way to rapidly gain insight into the association between virus-host interactions and COVID-19 in various populations which can have a significant impact on managing the current pandemic and potential future disease outbreaks.
Article activity feed
-
SciScore for 10.1101/2020.09.28.20203257: (What is this?)
Please note, not all rigor criteria are appropriate for all manuscripts.
Table 1: Rigor
NIH rigor criteria are not applicable to paper type.Table 2: Resources
Software and Algorithms Sentences Resources The purpose of these other genetic file formats is to reduce file size – from many VCFs to just three distinct files: a) .bed (Plink binary biallelic genotype table), that contains the genotype call at biallelic variants, b) .bim (Plink extended MAP file) which also includes the names of the alleles: (chromosome, SNP, cM, base-position, allele 1, allele 2); c) .fam (Plink sample information file), is the last of the files and contains all the details with regards to the individuals including, whether there are parents in the datasets, and the sex (male/female). Plinksuggested: (PLINK, …SciScore for 10.1101/2020.09.28.20203257: (What is this?)
Please note, not all rigor criteria are appropriate for all manuscripts.
Table 1: Rigor
NIH rigor criteria are not applicable to paper type.Table 2: Resources
Software and Algorithms Sentences Resources The purpose of these other genetic file formats is to reduce file size – from many VCFs to just three distinct files: a) .bed (Plink binary biallelic genotype table), that contains the genotype call at biallelic variants, b) .bim (Plink extended MAP file) which also includes the names of the alleles: (chromosome, SNP, cM, base-position, allele 1, allele 2); c) .fam (Plink sample information file), is the last of the files and contains all the details with regards to the individuals including, whether there are parents in the datasets, and the sex (male/female). Plinksuggested: (PLINK, RRID:SCR_001757)CSV files can be used with most any spreadsheet program, such as Microsoft Excel or Google Spreadsheets. Microsoft Excelsuggested: (Microsoft Excel, RRID:SCR_016137)Another key specification of Nextflow is its integration with software repositories (including GitHub and BitBucket) and its native support for various cloud systems which provides rapid computation and effective scaling. BitBucketsuggested: (Bitbucket, RRID:SCR_000502)In addition to the data marketplace, this feature sets the platform apart from other cloud computing tools that use Nextflow or other workflow management tools such as Toil, Snakemake or Bpipe. Bpipesuggested: (Bpipe, RRID:SCR_003471)GWAS pipeline to study virus-host interactions: The disadvantage of many research consortia, including some COVID-19 consortia, is that they do not open the data to all consortium members to analyze the collected data individually, but only provide data analysis by a central coordinator. GWASsuggested: (caGWAS, RRID:SCR_009617)The current version of the platform comes with preinstalled COVID-19 specific pipelines covering assembly statistics, alignment statistics, virus variant calling, and metagenomics analysis, and other pipelines that can be used for analyzing Covid-19 patient data such as GWAS or MetaGWAS. MetaGWASsuggested: NoneThe pipeline uses the Kraken program metagenomics classification37–39. Krakensuggested: (Kraken, RRID:SCR_005484)Using exact alignment of k-mers, the pipeline achieves classification significantly quicker to the fastest BLAST program. BLASTsuggested: (BLASTX, RRID:SCR_001653)The unaligned, viral sequences are then taken for de novo assembly using the Spades program and evaluated using the Quast program. Spadessuggested: (SPAdes, RRID:SCR_000131)Quastsuggested: (QUAST, RRID:SCR_001228)The contigs are subjected to gene/ORF prediction and the resulting sequences are further annotated using PROKKA. PROKKAsuggested: (Prokka, RRID:SCR_014732)The alignments are checked for duplicates and realigned using Picard. Picardsuggested: (Picard, RRID:SCR_006525)The variants are annotated with snpEff. snpEffsuggested: (SnpEff, RRID:SCR_005191)Results from OddPub: Thank you for sharing your data.
Results from LimitationRecognizer: An explicit section about the limitations of the techniques employed in this study was not found. We encourage authors to address study limitations.Results from TrialIdentifier: No clinical trial numbers were referenced.
Results from Barzooka: We did not find any issues relating to the usage of bar graphs.
Results from JetFighter: We did not find any issues relating to colormaps.
Results from rtransparent:- Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
- Thank you for including a funding statement. Authors are encouraged to include this statement when submitting to a journal.
- No protocol registration statement was detected.
-
