CROssBAR: Comprehensive Resource of Biomedical Relations with Deep Learning Applications and Knowledge Graph Representations
This article has been Reviewed by the following groups
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
- Evaluated articles (ScreenIT)
Abstract
Systemic analysis of available large-scale biological and biomedical data is critical for developing novel and effective treatment approaches against both complex and infectious diseases. Owing to the fact that different sections of the biomedical data is produced by different organizations/institutions using various types of technologies, the data are scattered across individual computational resources, without any explicit relations/connections to each other, which greatly hinders the comprehensive multi-omics-based analysis of data. We aimed to address this issue by constructing a new biological and biomedical data resource, CROssBAR, a comprehensive system that integrates large-scale biomedical data from various resources and store them in a new NoSQL database, enrich these data with deep-learning-based prediction of relations between numerous biomedical entities, rigorously analyse the enriched data to obtain biologically meaningful modules and display them to users via easy-to-interpret, interactive and heterogenous knowledge graph (KG) representations within an open access, user-friendly and online web-service at https://crossbar.kansil.org . As a use-case study, we constructed CROssBAR COVID-19 KGs (available at: https://crossbar.kansil.org/covid_main.php ) that incorporate relevant virus and host genes/proteins, interactions, pathways, phenotypes and other diseases, as well as known and completely new predicted drugs/compounds. Our COVID-19 graphs can be utilized for a systems-level evaluation of relevant virus-host protein interactions, mechanisms, phenotypic implications and potential interventions.
Article activity feed
-
SciScore for 10.1101/2020.09.14.296889: (What is this?)
Please note, not all rigor criteria are appropriate for all manuscripts.
Table 1: Rigor
NIH rigor criteria are not applicable to paper type.Table 2: Resources
Software and Algorithms Sentences Resources There are more than 100 million distinct drug candidate compound records in total in public bioactive chemical databases such as ChEMBL and PubChem, let alone the theoretical number of all possible small molecules around 1060. ChEMBLsuggested: (ChEMBL, RRID:SCR_014042)PubChemsuggested: (PubChem, RRID:SCR_004284)This approach leaves out the detailed reaction-based mechanistic information provided in pathway databases such as Reactome and KEGG pathways; however, the inclusion of this information via applying a pathway resource styled network approach would prevent the generation of large … SciScore for 10.1101/2020.09.14.296889: (What is this?)
Please note, not all rigor criteria are appropriate for all manuscripts.
Table 1: Rigor
NIH rigor criteria are not applicable to paper type.Table 2: Resources
Software and Algorithms Sentences Resources There are more than 100 million distinct drug candidate compound records in total in public bioactive chemical databases such as ChEMBL and PubChem, let alone the theoretical number of all possible small molecules around 1060. ChEMBLsuggested: (ChEMBL, RRID:SCR_014042)PubChemsuggested: (PubChem, RRID:SCR_004284)This approach leaves out the detailed reaction-based mechanistic information provided in pathway databases such as Reactome and KEGG pathways; however, the inclusion of this information via applying a pathway resource styled network approach would prevent the generation of large heterogeneous networks composed of tens of different pathways and other components. KEGGsuggested: (KEGG, RRID:SCR_012773)Both Reactome and KEGG pathways provide the same type of biological information at the level of large-scale biological processes; however, Reactome also divides these processes into sub-pathways, whereas KEGG only provides the pathway information at a generic level. Reactomesuggested: (Reactome, RRID:SCR_003485)In CROssBAR-WS, we incorporated the standard layouts of CytoScape Web, such as circle, cose, grid and concentric. CytoScapesuggested: (Cytoscape, RRID:SCR_003032)Results from OddPub: Thank you for sharing your code and data.
Results from LimitationRecognizer: An explicit section about the limitations of the techniques employed in this study was not found. We encourage authors to address study limitations.Results from TrialIdentifier: No clinical trial numbers were referenced.
Results from Barzooka: We did not find any issues relating to the usage of bar graphs.
Results from JetFighter: We did not find any issues relating to colormaps.
Results from rtransparent:- Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
- Thank you for including a funding statement. Authors are encouraged to include this statement when submitting to a journal.
- No protocol registration statement was detected.
-
SciScore for 10.1101/2020.09.14.296889: (What is this?)
Please note, not all rigor criteria are appropriate for all manuscripts.
Table 1: Rigor
NIH rigor criteria are not applicable to paper type.Table 2: Resources
Experimental Models: Cell Lines Sentences Resources NCI-60 sulforhodamine B(SRB) cytotoxicity assay Huh7 and Mahlavu liver cells were grown in 96-well plates (1000-200 cells/well) in an incubator for 24 hours. NCI-60suggested: NoneGene expression analysis of Chloroquine with NanoString multiplex gene expression panel Huh7 and Mahlavu liver cells were treated with CQ at cytotoxic doses of 3.6 μM and 12 μM, respectively, for 48 h. Huh7suggested: …SciScore for 10.1101/2020.09.14.296889: (What is this?)
Please note, not all rigor criteria are appropriate for all manuscripts.
Table 1: Rigor
NIH rigor criteria are not applicable to paper type.Table 2: Resources
Experimental Models: Cell Lines Sentences Resources NCI-60 sulforhodamine B(SRB) cytotoxicity assay Huh7 and Mahlavu liver cells were grown in 96-well plates (1000-200 cells/well) in an incubator for 24 hours. NCI-60suggested: NoneGene expression analysis of Chloroquine with NanoString multiplex gene expression panel Huh7 and Mahlavu liver cells were treated with CQ at cytotoxic doses of 3.6 μM and 12 μM, respectively, for 48 h. Huh7suggested: NoneSoftware and Algorithms Sentences Resources CROssBAR database (CROssBAR-DB) comprises carefully selected features from various data sources namely UniProt, IntAct, InterPro, Reactome, Ensembl, DrugBank, ChEMBL, PubChem, KEGG, OMIM, Orphanet, Gene Ontology, Experimental Factor Ontology (EFO) and Human Phenotype Ontology (HPO). InterProsuggested: (InterPro, RRID:SCR_006695)Several options are provided to users to customize the procedure both before the search, such as the UniProt databases to be used (UniProtKB/Swiss-Prot or UniProtKB/Swiss-Prot+UniProtKB/TrEMBL), taxons to be included, and the number of terms/nodes to include from each entity type (selected from enrichment score-based ranked lists). UniProtsuggested: (UniProtKB, RRID:SCR_004426)However, it is possible to query the CROssBAR-DB using the provided API service, to obtain data entries from PubChem database collections. PubChemsuggested: (PubChem, RRID:SCR_004284)The dataset is periodically updated with each ChEMBL database release. ChEMBLsuggested: (ChEMBL, RRID:SCR_014042)Even though there are entries for proteins from hundreds of different organisms in the UniProtKB/Swiss-Prot database, only a few of these non-human protein entries possess annotations in terms of pathway memberships, targeting drugs/compounds and phenotype/disease implications. UniProtKB/Swiss-Protsuggested: NoneIn CROssBAR-WS, we incorporated the standard layouts of CytoScape Web, such as circle, cose, grid and concentric. CytoScapesuggested: (Cytoscape, RRID:SCR_003032)Second, we eliminated the protein entries that are not reviewed (i.e., not from UniProtKB/Swiss-Prot) except SARS-CoV-2 ORF10 (accession: A0A663DJA2), which currently is an unreviewed protein entry in UniProtKB/TrEMBL. UniProtKB/TrEMBLsuggested: NoneWe also filtered out a portion of the host genes/proteins using interaction-based information, according to their confidence scores reported in IntAct. IntActsuggested: (IntAct, RRID:SCR_006944)Finally, we added drug-disease relationships based on reported drug indications obtained from the KEGG resource. KEGGsuggested: (KEGG, RRID:SCR_012773)We also merged nodes with respect to drug-compound entry correspondences in DrugBank and ChEMBL databases. DrugBanksuggested: (DrugBank, RRID:SCR_002700)Results from OddPub: Thank you for sharing your code and data.
Results from LimitationRecognizer: An explicit section about the limitations of the techniques employed in this study was not found. We encourage authors to address study limitations.
Results from TrialIdentifier: No clinical trial numbers were referenced.
Results from Barzooka: We did not find any issues relating to the usage of bar graphs.
Results from JetFighter: We did not find any issues relating to colormaps.
About SciScore
SciScore is an automated tool that is designed to assist expert reviewers by finding and presenting formulaic information scattered throughout a paper in a standard, easy to digest format. SciScore checks for the presence and correctness of RRIDs (research resource identifiers), and for rigor criteria such as sex and investigator blinding. For details on the theoretical underpinning of rigor criteria and the tools shown here, including references cited, please follow this link.
-
