Building a community-driven bioinformatics platform to facilitate Cannabis sativa multi-omics research

Curation statements for this article:
  • Curated by GigaByte

    GigaByte logo

    Editors Assessment:

    This paper reports the establishment of the International Cannabis Genomics Research Consortium (ICGRC) web portal leveraging the open source Tripal platform to enhance data accessibility and integration for Cannabis sativa (Cannabis) multi-omics research. With the aim of bringing together the wealth of publicly available genomic, transcriptomic, proteomic, and metabolomic data sets to improve cannabis for food, fiber and medicinal traits. Tripal is a content management system for genomics data, presenting a ready-to-use specialized ‘omics modules for loading, visualization, and analysis, and is GMOD (Generic Model Organism Database) standards-compliant. The paper explaining how this was put together, what data and features are available, and providing a case study for other communities wanting to create their own Tripal platform. Covering their setup and customizations of the Tripal platform, and how they re-engineered modules for multi-omics data integration, and addition of many other custom features that can be reused. Peer review fixed a few minor bugs and added clarifications on how the platform will be updated.

    *This evaluation refers to version 1 of the preprint

This article has been Reviewed by the following groups

Read the full article

Abstract

Global changes in cannabis legislation after decades of stringent regulation and heightened demand for its industrial and medicinal applications have spurred recent genetic and genomics research. An international research community emerged and identified the need for a web portal to host cannabis-specific datasets that seamlessly integrates multiple data sources and serves omics-type analyses, fostering information sharing. The Tripal platform was used to host public genome assemblies, gene annotations, quantitative trait loci and genetic maps, gene and protein expression data, metabolic profiles and their sample attributes. Single nucleotide polymorphisms were called using public resequencing datasets on three genomes. Additional applications, such as SNP-Seek and MapManJS, were embedded into Tripal. A multi-omics data integration web-service Application Programming Interface (API), developed on top of existing Tripal modules, returns generic tables of samples, properties and values. Use cases demonstrate the API’s utility for various omics analyses, enabling researchers to perform multi-omics analyses efficiently. Availability and implementation The web portal can be accessed at www.icgrc.info.

Article activity feed

  1. Editors Assessment:

    This paper reports the establishment of the International Cannabis Genomics Research Consortium (ICGRC) web portal leveraging the open source Tripal platform to enhance data accessibility and integration for Cannabis sativa (Cannabis) multi-omics research. With the aim of bringing together the wealth of publicly available genomic, transcriptomic, proteomic, and metabolomic data sets to improve cannabis for food, fiber and medicinal traits. Tripal is a content management system for genomics data, presenting a ready-to-use specialized ‘omics modules for loading, visualization, and analysis, and is GMOD (Generic Model Organism Database) standards-compliant. The paper explaining how this was put together, what data and features are available, and providing a case study for other communities wanting to create their own Tripal platform. Covering their setup and customizations of the Tripal platform, and how they re-engineered modules for multi-omics data integration, and addition of many other custom features that can be reused. Peer review fixed a few minor bugs and added clarifications on how the platform will be updated.

    *This evaluation refers to version 1 of the preprint

  2. AbstractGlobal changes in Cannabis legislation after decades of stringent regulation, and heightened demand for its industrial and medicinal applications have spurred recent genetic and genomics research. An international research community emerged and identified the need for a web portal to host Cannabis-specific datasets that seamlessly integrates multiple data sources and serves omics-type analyses, fostering information sharing.The Tripal platform was used to host public genome assemblies, gene annotations, QTL and genetic maps, gene and protein expression, metabolic profile and their sample attributes. SNPs were called using public resequencing datasets on three genomes. Additional applications, such as SNP-Seek and MapManJS, were embedded into Tripal. A multi-omics data integration web-service API, developed on top of existing Tripal modules, returns generic tables of sample, property, and values. Use-cases demonstrate the API’s utility for various -omics analyses, enabling researchers to perform multi- omics analyses efficiently.

    This work has been published in GigaByte Journal under a CC-BY 4.0 license (https://doi.org/10.46471/gigabyte.137). These reviews are as follows.

    Reviewer 1. Weiwen Wang

    Is the code executable?

    Unable to test.

    This manuscript is about an online platform, and I am not sure how to test the code.

    Is installation/deployment sufficiently outlined in the paper and documentation, and does it proceed as outlined?

    Same as above.

    Additional Comments:

    With the increasing legalization of cannabis in many countries today, exploring this crop has become a hot topic of research. This manuscript by Mansueto et al. introduces a platform built on the Tripal framework, designed to facilitate multi-omics research in Cannabis sativa. The platform integrates genomic, transcriptomic, proteomic, and metabolomic data, providing researchers with a comprehensive resource for data analysis and sharing. Additionally, APIs have been developed, enabling rapid querying. This manuscript detailed information on how to customize Tripal modules and Chado schema for managing biological entities. Finally, this manuscript highlights the importance of standardization in data storage and analysis, proposing community-wide adoption of standardized nomenclature to ensure consistency and traceability of data. Overall, the platform is poised to become a valuable resource for cannabis research and to advance scientific progress in related fields.

    While this manuscript was engaging, particularly in the sections on Tripal "re-engineering" and controlled vocabulary, I do have several concerns.

    1 Because my registration (using business email) has not been approved, I cannot test the functions requiring ICGRC registration.

    2 The authors noted that the Cannabis Genome Browser has not been updated. Do the authors have a plan for updating ICGRC? If so, what is the proposed update frequency?

    3 ICGRC currently includes only a few cannabis cultivars, especially when compared to other platforms like Kannapedia and CannabisGDB. Do the authors have plans to add additional cultivars, such as First Light and Jamaican Lions mentioned in this manuscript, in the near future?

    4 When I tried to register using Gmail, an error popped up: ‘Domain is not allowed to register for this site’. Perhaps it would be clearer to instruct users to use a business email for registration directly.

    5 There is a data submission function in ICGRC, but the exact workings of this feature remain unclear to me. If a user submits a cannabis genome to the ICGRC, whether this data will be visualized within specific modules like synteny search or genetic mapping tools on the platform.

    Reviewer 2. Hongyun Shang

    Is the code executable?

    Unable to test.

    Is installation/deployment sufficiently outlined in the paper and documentation, and does it proceed as outlined?

    Unable to test.

    This is a comprehensive database with many features that improves the shortcomings of cannabis species that had no genome database in the past. It is a good work. Here are some minor suggestions:

    1. Did not find the function of searching gene and protein sequences directly by gene id without providing chromosome location, which may be a common feature of many omics databases.
    2. In the chapter "The need for cannabis multi-omics databases and analysis platforms", "There are no analysis tools or results available on this website", "No results available" seems inappropriate.
    3. In the chapter "Cannabis - Omics, Genetic and Phenotypic Datasets in the Public Domain", "Crop Ontology" Crop Ontology, is "Crop Ontology" repeated?