The snoGloBe interaction predictor reveals a broad spectrum of C/D snoRNA RNA targets
This article has been Reviewed by the following groups
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
- Evaluated articles (ScreenIT)
Abstract
Box C/D small nucleolar RNAs (snoRNAs) are a conserved class of RNA known for their role in guiding ribosomal RNA 2′-O-ribose methylation. Recently, C/D snoRNAs were also implicated in regulating the expression of non-ribosomal genes through different modes of binding. Large scale RNA–RNA interaction datasets detect many snoRNAs binding messenger RNA, but are limited by specific experimental conditions. To enable a more comprehensive study of C/D snoRNA interactions, we created snoGloBe, a human C/D snoRNA interaction predictor based on a gradient boosting classifier. SnoGloBe considers the target type, position and sequence of the interactions, enabling it to outperform existing predictors. Interestingly, for specific snoRNAs, snoGloBe identifies strong enrichment of interactions near gene expression regulatory elements including splice sites. Abundance and splicing of predicted targets were altered upon the knockdown of their associated snoRNA. Strikingly, the predicted snoRNA interactions often overlap with the binding sites of functionally related RNA binding proteins, reinforcing their role in gene expression regulation. SnoGloBe is also an excellent tool for discovering viral RNA targets, as shown by its capacity to identify snoRNAs targeting the heavily methylated SARS-CoV-2 RNA. Overall, snoGloBe is capable of identifying experimentally validated binding sites and predicting novel sites with shared regulatory function.
Article activity feed
-
-
SciScore for 10.1101/2021.09.14.460265: (What is this?)
Please note, not all rigor criteria are appropriate for all manuscripts.
Table 1: Rigor
NIH rigor criteria are not applicable to paper type.Table 2: Resources
Experimental Models: Cell Lines Sentences Resources SNORD126 knockdown: HepG2 cells were cultured in complete Eagle’s Minimum Essential Medium (EMEM from Wisent) and passaged twice a week, according to ATCC guidelines. HepG2suggested: CLS Cat# 300198/p2277_Hep-G2, RRID:CVCL_0027)Software and Algorithms Sentences Resources High-throughput RNA-RNA interaction analysis: The high-throughput RNA-RNA interaction datasets from PARIS (SRR2814761, SRR2814762, SRR2814763, SRR2814764 and SRR2814765), LIGR-seq (SRR3361013 and SRR3361017) and SPLASH (SRR3404924, SRR3404925, SRR3404936 and SRR3404937) were obtained from the short read archive SRA … SciScore for 10.1101/2021.09.14.460265: (What is this?)
Please note, not all rigor criteria are appropriate for all manuscripts.
Table 1: Rigor
NIH rigor criteria are not applicable to paper type.Table 2: Resources
Experimental Models: Cell Lines Sentences Resources SNORD126 knockdown: HepG2 cells were cultured in complete Eagle’s Minimum Essential Medium (EMEM from Wisent) and passaged twice a week, according to ATCC guidelines. HepG2suggested: CLS Cat# 300198/p2277_Hep-G2, RRID:CVCL_0027)Software and Algorithms Sentences Resources High-throughput RNA-RNA interaction analysis: The high-throughput RNA-RNA interaction datasets from PARIS (SRR2814761, SRR2814762, SRR2814763, SRR2814764 and SRR2814765), LIGR-seq (SRR3361013 and SRR3361017) and SPLASH (SRR3404924, SRR3404925, SRR3404936 and SRR3404937) were obtained from the short read archive SRA (https://www.ncbi.nlm.nih.gov/sra) using fastq-dump from the SRA toolkit (v2.8.2). https://www.ncbi.nlm.nih.gov/srasuggested: (NCBI Sequence Read Archive (SRA, RRID:SCR_004891)PCR duplicates were removed from LIGR-seq datasets using the script readCollpase.pl from the icSHAPE pipeline and the reads were trimmed using Trimmomatic version 0.35 with the following options : HEADCROP:5 ILLUMINACLIP:TruSeq3-SE.fa:2:30:4 TRAILING:20 MINLEN:25. Trimmomaticsuggested: (Trimmomatic, RRID:SCR_011848)All the samples were analyzed using the PARIS pipeline as described in sections 3.7 and 3.8 from (Lu et al. 2018). PARISsuggested: NoneThe RNA duplexes were assigned to genes using the annotation file described in (Boivin et al. 2018) to which missing rRNA annotations from RefSeq were added. RefSeqsuggested: (RefSeq, RRID:SCR_003496)Interactions from snoRNABase and from the literature that were shorter than 13 nucleotides were padded by adding their flanking sequence to respect the length threshold. snoRNABasesuggested: None(Fig 2E) Building the model: The model used is a gradient boosting classifier from scikit-learn (v0.21.3) (Pedregosa et al. 2011). scikit-learnsuggested: (scikit-learn, RRID:SCR_002577)The gene ontology enrichment analysis of the predicted targets was done using g:Profiler (Raudvere et al. 2019). g:Profilersuggested: (G:Profiler, RRID:SCR_006809)The number of overlaps between the predicted interactions and the eCLIP regions was computed using BEDTools intersect -s. BEDToolssuggested: (BEDTools, RRID:SCR_006646)RNA-seq analysis: The resulting base calls were converted to fastq files using bcl2fastq v2.20 (Illumina) with the following options: --minimum-trimmed-read-length 13, --mask-short-adapter- reads 13, --no-lane-splitting. bcl2fastqsuggested: (bcl2fastq , RRID:SCR_015058)The sequence quality was assessed using FastQC v0.11.5 (Andrews) before and after the trimming. FastQCsuggested: (FastQC, RRID:SCR_014583)The trimmed sequences were aligned to the human genome (hg38) using STAR v2.6.1a (Dobin et al. 2013) with the options --outFilterScoreMinOverLread 0.3, -- outFilterMatchNminOverLread 0.3, --outFilterMultimapNmax 100, -- winAnchorMultimapNmax 10, --alignEndsProtrude 5 ConcordantPair. STARsuggested: (STAR, RRID:SCR_004463)Only primary alignments were kept using samtools view -F 256 (v1.5) (Li et al. 2009). samtoolssuggested: (SAMTOOLS, RRID:SCR_002105)The gene quantification was done using CoCo correct_count -s 2 -p (v0.2.5p1) (Deschamps- Francoeur et al. 2019). CoCosuggested: (CoCo, RRID:SCR_010947)DESeq2 (Love et al. 2014) was used for the differential expression analysis. DESeq2suggested: (DESeq, RRID:SCR_000154)The alternative splicing analysis was done using MAJIQ v2.2 and VOILA with the option --threshold 0.1 (Vaquero-Garcia et al. 2016). MAJIQsuggested: (MAJIQ, RRID:SCR_016706)Results from OddPub: Thank you for sharing your code and data.
Results from LimitationRecognizer: An explicit section about the limitations of the techniques employed in this study was not found. We encourage authors to address study limitations.Results from TrialIdentifier: No clinical trial numbers were referenced.
Results from Barzooka: We did not find any issues relating to the usage of bar graphs.
Results from JetFighter: Please consider improving the rainbow (“jet”) colormap(s) used on page 50. At least one figure is not accessible to readers with colorblindness and/or is not true to the data, i.e. not perceptually uniform.
Results from rtransparent:- Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
- Thank you for including a funding statement. Authors are encouraged to include this statement when submitting to a journal.
- No protocol registration statement was detected.
Results from scite Reference Check: We found no unreliable references.
-
