BrumiR: A toolkit for de novo discovery of microRNAs from sRNA-seq data

This article has been Reviewed by the following groups

Read the full article See related articles

Abstract

MicroRNAs (miRNAs) are small noncoding RNAs that are key players in the regulation of gene expression. In the past decade, with the increasing accessibility of high-throughput sequencing technologies, different methods have been developed to identify miRNAs, most of which rely on preexisting reference genomes. However, when a reference genome is absent or is not of high quality, such identification becomes more difficult. In this context, we developed BrumiR, an algorithm that is able to discover miRNAs directly and exclusively from small RNA (sRNA) sequencing (sRNA-seq) data. We benchmarked BrumiR with datasets encompassing animal and plant species using real and simulated sRNA-seq experiments. The results demonstrate that BrumiR reaches the highest recall for miRNA discovery, while at the same time being much faster and more efficient than the state-of-the-art tools evaluated. The latter allows BrumiR to analyze a large number of sRNA-seq experiments, from plants or animal species. Moreover, BrumiR detects additional information regarding other expressed sequences (sRNAs, isomiRs, etc.), thus maximizing the biological insight gained from sRNA-seq experiments. Additionally, when a reference genome is available, BrumiR provides a new mapping tool (BrumiR2reference) that performs an a posteriori exhaustive search to identify the precursor sequences. Finally, we also provide a machine learning classifier based on a random forest model that evaluates the sequence-derived features to further refine the prediction obtained from the BrumiR-core. The code of BrumiR and all the algorithms that compose the BrumiR toolkit are freely available at https://github.com/camoragaq/BrumiR.

Article activity feed

  1. AbstractMicroRNAs (miRNAs)

    This work has been peer reviewed in GigaScience ( see https://doi.org/10.1093/gigascience/giac093 ), which carries out open, named peer-review. These reviews are published under a CC-BY 4.0 license and were as follows:

    Reviewer name: Ernesto Picardi

    The manuscript by Moraga et al. describes BrumiR, a software devoted to the de novo identification of miRNAs from deep sequencing experiments of the RNA fraction at low molecular weight. In contrast with existing tools, BrumiR is based on de Bruijn graphs, generated directly from raw fastq reads. The performances on simulated and real sequencing data, in terms of precision, recall and FScore, are very good. In addition, the tool is ultra-fast, enabling the analysis of huge amount of data. I have tried to use BrumiR but I always got a GLIB error. I have tested the …

  2. AbstractMicroRNAs

    This work has been peer reviewed in GigaScience ( see https://doi.org/10.1093/gigascience/giac093 ), which carries out open, named peer-review. These reviews are published under a CC-BY 4.0 license and were as follows:

    Reviewer name: Marc Friedlander

    The authors here present BrumiR, a de Bruijn-based method to discover miRNAs independently of a reference genome. Today most miRNA discovery and annotation is done by mapping sequenced RNAs to readily available reference genomes and analyzing the mapping profiles. However, there are some uses cases where the genome-free approach is needed (particularly for species that have no reference genome or where the genomes have missing parts); therefore BrumiR could potentially be useful for the community. However, the comparison to existing tools needs to be done in a more …

  3. Abstract

    This work has been peer reviewed in GigaScience ( see https://doi.org/10.1093/gigascience/giac093 ), which carries out open, named peer-review. These reviews are published under a CC-BY 4.0 license and were as follows:

    Reviewer name: Dadi Gao

    Summary: The authors developed a de novo assembly method, BrumiR, for small RNA sequencing data based on de Bruijin graph algorithm. This tool displayed a relatively high sensitivity in finding miRNAs and helped the authors discover a novel miRNA in A. thaliana roots.

    Major comments:

    Have the authors compare the performance with different seed length? Even if the minimal miR length is 18nt in MiRBase 21, seed=18 might not necessarily lead to the best AUC or F score (This might also be related to Comment 4).

    The authors need to benchmark BrumiR with more existing tools (e.g. those …