miRbiom: Machine-learning on Bayesian causal nets of RBP-miRNA interactions successfully predicts miRNA profiles

This article has been Reviewed by the following groups

Read the full article See related articles

Abstract

Formation of mature miRNAs and their expression is a highly controlled process. It is very much dependent upon the post-transcriptional regulatory events. Recent findings suggest that several RNA binding proteins beyond Drosha/Dicer are involved in the processing of miRNAs. Deciphering of conditional networks for these RBP-miRNA interactions may help to reason the spatio-temporal nature of miRNAs which can also be used to predict miRNA profiles. In this direction, >25TB of data from different platforms were studied (CLIP-seq/RNA-seq/miRNA-seq) to develop Bayesian causal networks capable of reasoning miRNA biogenesis. The networks ably explained the miRNA formation when tested across a large number of conditions and experimentally validated data. The networks were modeled into an XGBoost machine learning system where expression information of the network components was found capable to quantitatively explain the miRNAs formation levels and their profiles. The models were developed for 1,204 human miRNAs whose accurate expression level could be detected directly from the RNA-seq data alone without any need of doing separate miRNA profiling experiments like miRNA-seq or arrays. A first of its kind, miRbiom performed consistently well with high average accuracy (91%) when tested across a large number of experimentally established data from several conditions. It has been implemented as an interactive open access web-server where besides finding the profiles of miRNAs, their downstream functional analysis can also be done. miRbiom will help to get an accurate prediction of human miRNAs profiles in the absence of profiling experiments and will be an asset for regulatory research areas. The study also shows the importance of having RBP interaction information in better understanding the miRNAs and their functional projectiles where it also lays the foundation of such studies and software in future.

Article activity feed

  1. SciScore for 10.1101/2020.06.18.156851: (What is this?)

    Please note, not all rigor criteria are appropriate for all manuscripts.

    Table 1: Rigor

    NIH rigor criteria are not applicable to paper type.

    Table 2: Resources

    Antibodies
    SentencesResources
    Antibodies: Primary antibodies raised against eGFP (Thermo Fisher)
    Antibodies: Primary antibodies raised against eGFP
    suggested: None
    Anti-Mouse IgG-horseradish peroxidase (HRP) (Bio-Rad) and Anti-Rabbit IgG-HRP (Bio-Rad) raised in goat (1:3000 dilution) were the secondary antibodies used in the study.
    Anti-Mouse IgG-horseradish peroxidase
    suggested: None
    Anti-Rabbit IgG-HRP
    suggested: None
    Primary antibodies raised against eGFP (Thermo Fisher),
    eGFP
    suggested: None
    Anti-Mouse IgG-horseradish peroxidase (HRP) (Bio-Rad) was the secondary antibody used in the study.
    Anti-Mouse IgG-horseradish
    suggested: None
    Experimental Models: Organisms/Strains
    SentencesResources
    mirDeep2 [60] was used for expression analysis of mature miRNAs from sRNA-seq data.
    mirDeep2
    suggested: None
    Software and Algorithms
    SentencesResources
    Source of NGS Data and Data processing: Data for sRNA-seq and RNA-seq based high throughput studies were collected from Gene Expression Omnibus (GEO) and Sequence Read Archive (SRA) for 21 experiments (data volume ∼15.6TB) which included 47 different experimental conditions.
    Gene Expression Omnibus
    suggested: (Gene Expression Omnibus (GEO, RRID:SCR_005012)
    Sequence Read Archive
    suggested: (DDBJ Sequence Read Archive, RRID:SCR_001370)
    For mapping RNA-seq reads across the human genome build 38 assembly (hg38) Seqmap [59] was used which is based on Bowtie platform.
    Seqmap
    suggested: (SeqMap, RRID:SCR_005495)
    Bowtie
    suggested: (Bowtie, RRID:SCR_005476)
    The alignment results were saved in SAM format for expression quantification. rSeq [59] was used for quantification of gene expression from SAM files.
    rSeq
    suggested: (rSeq, RRID:SCR_000562)
    For expression analysis of pre-miRNAs, same RNA-seq pipeline was used where RNA-seq reads were mapped over known pre-miRNA sequences collected from mirBase.
    mirBase
    suggested: (miRBase, RRID:SCR_017497)
    For processing of CLIP-seq data FastX (http://hannonlab.cshl.edu/fastx_toolkit/) was used.
    http://hannonlab.cshl.edu/fastx_toolkit/
    suggested: (FASTX-Toolkit, RRID:SCR_005534)
    In order to observe if any correlation exists between the binding site density and RBP expression, CLIP-seq data and RNA-seq data in same experimental conditions were collected for the 73 RBPs from GEO and ENCODE databases.
    ENCODE
    suggested: (Encode, RRID:SCR_015482)
    To address this distinctive behavior, possible partners were collected for each RBP from STRING database [27].
    STRING
    suggested: (STRING, RRID:SCR_005223)
    A correlogram plot was constructed using the corr-plot R package to visualized the miRNA and RBP association during mature miRNA processing from pre-miRNA combining all eight tissue expression data stated above.
    corr-plot
    suggested: None
    The experimentally validated targets of different miRNAs for both up and down were collected from mirTarbase database [54] separately.
    mirTarbase
    suggested: (miRTarBase, RRID:SCR_017355)
    The pathways were studied for three different databases such as KEGG, Wiki, and PantherDB.
    KEGG
    suggested: (KEGG, RRID:SCR_012773)
    Experiments were conducted in triplicates and statistical analysis was performed using GraphPad Prizm software version 7.0. (Protocol employed of the end point PCR, qPCR along with the list of primers used is provided in Table 5).
    GraphPad Prizm
    suggested: None

    Results from OddPub: We did not detect open data. We also did not detect open code. Researchers are encouraged to share open data when possible (see Nature blog).


    Results from LimitationRecognizer: An explicit section about the limitations of the techniques employed in this study was not found. We encourage authors to address study limitations.

    Results from TrialIdentifier: No clinical trial numbers were referenced.


    Results from Barzooka: We found bar graphs of continuous data. We recommend replacing bar graphs with more informative graphics, as many different datasets can lead to the same bar graph. The actual data may suggest different conclusions from the summary statistics. For more information, please see Weissgerber et al (2015).


    Results from JetFighter: Please consider improving the rainbow (“jet”) colormap(s) used on page 66. At least one figure is not accessible to readers with colorblindness and/or is not true to the data, i.e. not perceptually uniform.


    Results from rtransparent:
    • Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
    • Thank you for including a funding statement. Authors are encouraged to include this statement when submitting to a journal.
    • No protocol registration statement was detected.

    About SciScore

    SciScore is an automated tool that is designed to assist expert reviewers by finding and presenting formulaic information scattered throughout a paper in a standard, easy to digest format. SciScore checks for the presence and correctness of RRIDs (research resource identifiers), and for rigor criteria such as sex and investigator blinding. For details on the theoretical underpinning of rigor criteria and the tools shown here, including references cited, please follow this link.