CROssBAR: comprehensive resource of biomedical relations with knowledge graph representations
This article has been Reviewed by the following groups
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
- Evaluated articles (ScreenIT)
Abstract
Systemic analysis of available large-scale biological/biomedical data is critical for studying biological mechanisms, and developing novel and effective treatment approaches against diseases. However, different layers of the available data are produced using different technologies and scattered across individual computational resources without any explicit connections to each other, which hinders extensive and integrative multi-omics-based analysis. We aimed to address this issue by developing a new data integration/representation methodology and its application by constructing a biological data resource. CROssBAR is a comprehensive system that integrates large-scale biological/biomedical data from various resources and stores them in a NoSQL database. CROssBAR is enriched with the deep-learning-based prediction of relationships between numerous data entries, which is followed by the rigorous analysis of the enriched data to obtain biologically meaningful modules. These complex sets of entities and relationships are displayed to users via easy-to-interpret, interactive knowledge graphs within an open-access service. CROssBAR knowledge graphs incorporate relevant genes-proteins, molecular interactions, pathways, phenotypes, diseases, as well as known/predicted drugs and bioactive compounds, and they are constructed on-the-fly based on simple non-programmatic user queries. These intensely processed heterogeneous networks are expected to aid systems-level research, especially to infer biological mechanisms in relation to genes, proteins, their ligands, and diseases.
Article activity feed
-
-
SciScore for 10.1101/2020.09.14.296889: (What is this?)
Please note, not all rigor criteria are appropriate for all manuscripts.
Table 1: Rigor
NIH rigor criteria are not applicable to paper type.Table 2: Resources
Software and Algorithms Sentences Resources There are more than 100 million distinct drug candidate compound records in total in public bioactive chemical databases such as ChEMBL and PubChem, let alone the theoretical number of all possible small molecules around 1060. ChEMBLsuggested: (ChEMBL, RRID:SCR_014042)PubChemsuggested: (PubChem, RRID:SCR_004284)This approach leaves out the detailed reaction-based mechanistic information provided in pathway databases such as Reactome and KEGG pathways; however, the inclusion of this information via applying a pathway resource styled network approach would prevent the generation of large … SciScore for 10.1101/2020.09.14.296889: (What is this?)
Please note, not all rigor criteria are appropriate for all manuscripts.
Table 1: Rigor
NIH rigor criteria are not applicable to paper type.Table 2: Resources
Software and Algorithms Sentences Resources There are more than 100 million distinct drug candidate compound records in total in public bioactive chemical databases such as ChEMBL and PubChem, let alone the theoretical number of all possible small molecules around 1060. ChEMBLsuggested: (ChEMBL, RRID:SCR_014042)PubChemsuggested: (PubChem, RRID:SCR_004284)This approach leaves out the detailed reaction-based mechanistic information provided in pathway databases such as Reactome and KEGG pathways; however, the inclusion of this information via applying a pathway resource styled network approach would prevent the generation of large heterogeneous networks composed of tens of different pathways and other components. KEGGsuggested: (KEGG, RRID:SCR_012773)Both Reactome and KEGG pathways provide the same type of biological information at the level of large-scale biological processes; however, Reactome also divides these processes into sub-pathways, whereas KEGG only provides the pathway information at a generic level. Reactomesuggested: (Reactome, RRID:SCR_003485)In CROssBAR-WS, we incorporated the standard layouts of CytoScape Web, such as circle, cose, grid and concentric. CytoScapesuggested: (Cytoscape, RRID:SCR_003032)Results from OddPub: Thank you for sharing your code and data.
Results from LimitationRecognizer: An explicit section about the limitations of the techniques employed in this study was not found. We encourage authors to address study limitations.Results from TrialIdentifier: No clinical trial numbers were referenced.
Results from Barzooka: We did not find any issues relating to the usage of bar graphs.
Results from JetFighter: We did not find any issues relating to colormaps.
Results from rtransparent:- Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
- Thank you for including a funding statement. Authors are encouraged to include this statement when submitting to a journal.
- No protocol registration statement was detected.
-
SciScore for 10.1101/2020.09.14.296889: (What is this?)
Please note, not all rigor criteria are appropriate for all manuscripts.
Table 1: Rigor
NIH rigor criteria are not applicable to paper type.Table 2: Resources
Experimental Models: Cell Lines Sentences Resources NCI-60 sulforhodamine B(SRB) cytotoxicity assay Huh7 and Mahlavu liver cells were grown in 96-well plates (1000-200 cells/well) in an incubator for 24 hours. NCI-60suggested: NoneGene expression analysis of Chloroquine with NanoString multiplex gene expression panel Huh7 and Mahlavu liver cells were treated with CQ at cytotoxic doses of 3.6 μM and 12 μM, respectively, for 48 h. Huh7suggested: …SciScore for 10.1101/2020.09.14.296889: (What is this?)
Please note, not all rigor criteria are appropriate for all manuscripts.
Table 1: Rigor
NIH rigor criteria are not applicable to paper type.Table 2: Resources
Experimental Models: Cell Lines Sentences Resources NCI-60 sulforhodamine B(SRB) cytotoxicity assay Huh7 and Mahlavu liver cells were grown in 96-well plates (1000-200 cells/well) in an incubator for 24 hours. NCI-60suggested: NoneGene expression analysis of Chloroquine with NanoString multiplex gene expression panel Huh7 and Mahlavu liver cells were treated with CQ at cytotoxic doses of 3.6 μM and 12 μM, respectively, for 48 h. Huh7suggested: NoneSoftware and Algorithms Sentences Resources CROssBAR database (CROssBAR-DB) comprises carefully selected features from various data sources namely UniProt, IntAct, InterPro, Reactome, Ensembl, DrugBank, ChEMBL, PubChem, KEGG, OMIM, Orphanet, Gene Ontology, Experimental Factor Ontology (EFO) and Human Phenotype Ontology (HPO). InterProsuggested: (InterPro, RRID:SCR_006695)Several options are provided to users to customize the procedure both before the search, such as the UniProt databases to be used (UniProtKB/Swiss-Prot or UniProtKB/Swiss-Prot+UniProtKB/TrEMBL), taxons to be included, and the number of terms/nodes to include from each entity type (selected from enrichment score-based ranked lists). UniProtsuggested: (UniProtKB, RRID:SCR_004426)However, it is possible to query the CROssBAR-DB using the provided API service, to obtain data entries from PubChem database collections. PubChemsuggested: (PubChem, RRID:SCR_004284)The dataset is periodically updated with each ChEMBL database release. ChEMBLsuggested: (ChEMBL, RRID:SCR_014042)Even though there are entries for proteins from hundreds of different organisms in the UniProtKB/Swiss-Prot database, only a few of these non-human protein entries possess annotations in terms of pathway memberships, targeting drugs/compounds and phenotype/disease implications. UniProtKB/Swiss-Protsuggested: NoneIn CROssBAR-WS, we incorporated the standard layouts of CytoScape Web, such as circle, cose, grid and concentric. CytoScapesuggested: (Cytoscape, RRID:SCR_003032)Second, we eliminated the protein entries that are not reviewed (i.e., not from UniProtKB/Swiss-Prot) except SARS-CoV-2 ORF10 (accession: A0A663DJA2), which currently is an unreviewed protein entry in UniProtKB/TrEMBL. UniProtKB/TrEMBLsuggested: NoneWe also filtered out a portion of the host genes/proteins using interaction-based information, according to their confidence scores reported in IntAct. IntActsuggested: (IntAct, RRID:SCR_006944)Finally, we added drug-disease relationships based on reported drug indications obtained from the KEGG resource. KEGGsuggested: (KEGG, RRID:SCR_012773)We also merged nodes with respect to drug-compound entry correspondences in DrugBank and ChEMBL databases. DrugBanksuggested: (DrugBank, RRID:SCR_002700)Results from OddPub: Thank you for sharing your code and data.
Results from LimitationRecognizer: An explicit section about the limitations of the techniques employed in this study was not found. We encourage authors to address study limitations.
Results from TrialIdentifier: No clinical trial numbers were referenced.
Results from Barzooka: We did not find any issues relating to the usage of bar graphs.
Results from JetFighter: We did not find any issues relating to colormaps.
About SciScore
SciScore is an automated tool that is designed to assist expert reviewers by finding and presenting formulaic information scattered throughout a paper in a standard, easy to digest format. SciScore checks for the presence and correctness of RRIDs (research resource identifiers), and for rigor criteria such as sex and investigator blinding. For details on the theoretical underpinning of rigor criteria and the tools shown here, including references cited, please follow this link.
-
