miRbiom: Machine-learning on Bayesian causal nets of RBP-miRNA interactions successfully predicts miRNA profiles

Abstract

Formation of mature miRNAs and their expression is a highly controlled process. It is very much dependent upon the post-transcriptional regulatory events. Recent findings suggest that several RNA binding proteins beyond Drosha/Dicer are involved in the processing of miRNAs. Deciphering of conditional networks for these RBP-miRNA interactions may help to reason the spatio-temporal nature of miRNAs which can also be used to predict miRNA profiles. In this direction, >25TB of data from different platforms were studied (CLIP-seq/RNA-seq/miRNA-seq) to develop Bayesian causal networks capable of reasoning miRNA biogenesis. The networks ably explained the miRNA formation when tested across a large number of conditions and experimentally validated data. The networks were modeled into an XGBoost machine learning system where expression information of the network components was found capable to quantitatively explain the miRNAs formation levels and their profiles. The models were developed for 1,204 human miRNAs whose accurate expression level could be detected directly from the RNA-seq data alone without any need of doing separate miRNA profiling experiments like miRNA-seq or arrays. A first of its kind, miRbiom performed consistently well with high average accuracy (91%) when tested across a large number of experimentally established data from several conditions. It has been implemented as an interactive open access web-server where besides finding the profiles of miRNAs, their downstream functional analysis can also be done. miRbiom will help to get an accurate prediction of human miRNAs profiles in the absence of profiling experiments and will be an asset for regulatory research areas. The study also shows the importance of having RBP interaction information in better understanding the miRNAs and their functional projectiles where it also lays the foundation of such studies and software in future.

SciScore for 10.1101/2020.06.18.156851: (What is this?)

Please note, not all rigor criteria are appropriate for all manuscripts.

Table 1: Rigor

NIH rigor criteria are not applicable to paper type.

Table 2: Resources

Antibodies
Sentences	Resources
Antibodies: Primary antibodies raised against eGFP (Thermo Fisher)	Antibodies: Primary antibodies raised against eGFP suggested: None
Anti-Mouse IgG-horseradish peroxidase (HRP) (Bio-Rad) and Anti-Rabbit IgG-HRP (Bio-Rad) raised in goat (1:3000 dilution) were the secondary antibodies used in the study.	Anti-Mouse IgG-horseradish peroxidase suggested: None Anti-Rabbit IgG-HRP suggested: None
Primary antibodies raised against eGFP (Thermo Fisher),	eGFP suggested: None
Anti-Mouse IgG-horseradish peroxidase (HRP) (Bio-Rad) was the secondary antibody used in the study.	Anti-Mouse IgG-horseradish suggested: None
…

SciScore for 10.1101/2020.06.18.156851: (What is this?)

Please note, not all rigor criteria are appropriate for all manuscripts.

Table 1: Rigor

NIH rigor criteria are not applicable to paper type.

Table 2: Resources

Antibodies
Sentences	Resources
Antibodies: Primary antibodies raised against eGFP (Thermo Fisher)	Antibodies: Primary antibodies raised against eGFP suggested: None
Anti-Mouse IgG-horseradish peroxidase (HRP) (Bio-Rad) and Anti-Rabbit IgG-HRP (Bio-Rad) raised in goat (1:3000 dilution) were the secondary antibodies used in the study.	Anti-Mouse IgG-horseradish peroxidase suggested: None Anti-Rabbit IgG-HRP suggested: None
Primary antibodies raised against eGFP (Thermo Fisher),	eGFP suggested: None
Anti-Mouse IgG-horseradish peroxidase (HRP) (Bio-Rad) was the secondary antibody used in the study.	Anti-Mouse IgG-horseradish suggested: None
Experimental Models: Organisms/Strains
Sentences	Resources
mirDeep2 [60] was used for expression analysis of mature miRNAs from sRNA-seq data.	mirDeep2 suggested: None
Software and Algorithms
Sentences	Resources
Source of NGS Data and Data processing: Data for sRNA-seq and RNA-seq based high throughput studies were collected from Gene Expression Omnibus (GEO) and Sequence Read Archive (SRA) for 21 experiments (data volume ∼15.6TB) which included 47 different experimental conditions.	Gene Expression Omnibus suggested: (Gene Expression Omnibus (GEO, RRID:SCR_005012) Sequence Read Archive suggested: (DDBJ Sequence Read Archive, RRID:SCR_001370)
For mapping RNA-seq reads across the human genome build 38 assembly (hg38) Seqmap [59] was used which is based on Bowtie platform.	Seqmap suggested: (SeqMap, RRID:SCR_005495) Bowtie suggested: (Bowtie, RRID:SCR_005476)
The alignment results were saved in SAM format for expression quantification. rSeq [59] was used for quantification of gene expression from SAM files.	rSeq suggested: (rSeq, RRID:SCR_000562)
For expression analysis of pre-miRNAs, same RNA-seq pipeline was used where RNA-seq reads were mapped over known pre-miRNA sequences collected from mirBase.	mirBase suggested: (miRBase, RRID:SCR_017497)
For processing of CLIP-seq data FastX (http://hannonlab.cshl.edu/fastx_toolkit/) was used.	http://hannonlab.cshl.edu/fastx_toolkit/ suggested: (FASTX-Toolkit, RRID:SCR_005534)
In order to observe if any correlation exists between the binding site density and RBP expression, CLIP-seq data and RNA-seq data in same experimental conditions were collected for the 73 RBPs from GEO and ENCODE databases.	ENCODE suggested: (Encode, RRID:SCR_015482)
To address this distinctive behavior, possible partners were collected for each RBP from STRING database [27].	STRING suggested: (STRING, RRID:SCR_005223)
A correlogram plot was constructed using the corr-plot R package to visualized the miRNA and RBP association during mature miRNA processing from pre-miRNA combining all eight tissue expression data stated above.	corr-plot suggested: None
The experimentally validated targets of different miRNAs for both up and down were collected from mirTarbase database [54] separately.	mirTarbase suggested: (miRTarBase, RRID:SCR_017355)
The pathways were studied for three different databases such as KEGG, Wiki, and PantherDB.	KEGG suggested: (KEGG, RRID:SCR_012773)
Experiments were conducted in triplicates and statistical analysis was performed using GraphPad Prizm software version 7.0. (Protocol employed of the end point PCR, qPCR along with the list of primers used is provided in Table 5).	GraphPad Prizm suggested: None

Results from OddPub: We did not detect open data. We also did not detect open code. Researchers are encouraged to share open data when possible (see Nature blog).

Results from LimitationRecognizer: An explicit section about the limitations of the techniques employed in this study was not found. We encourage authors to address study limitations.

Results from TrialIdentifier: No clinical trial numbers were referenced.

Results from Barzooka: We found bar graphs of continuous data. We recommend replacing bar graphs with more informative graphics, as many different datasets can lead to the same bar graph. The actual data may suggest different conclusions from the summary statistics. For more information, please see Weissgerber et al (2015).

Results from JetFighter: Please consider improving the rainbow (“jet”) colormap(s) used on page 66. At least one figure is not accessible to readers with colorblindness and/or is not true to the data, i.e. not perceptually uniform.

Results from rtransparent:

Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
Thank you for including a funding statement. Authors are encouraged to include this statement when submitting to a journal.
No protocol registration statement was detected.

Read the original source

miRbiom: Machine-learning on Bayesian causal nets of RBP-miRNA interactions successfully predicts miRNA profiles

This article has been Reviewed by the following groups

Discuss this preprint

Listed in

Abstract

Article activity feed