svaRetro and svaNUMT: Modular packages for annotation of retrotransposed transcripts and nuclear integration of mitochondrial DNA in genome sequencing data

This article has been Reviewed by the following groups

Read the full article See related articles

Abstract

Background

The biological significance of structural variation is now more widely recognized. However, due to the lack of available tools for downstream analysis, including processing and annotating, interpretation of structural variant calls remains a challenge.

Findings

Here we present svaRetro and svaNUMT , R packages that provide functions for annotating novel genomic events such as non-reference retro-copied transcripts and nuclear integration of mitochondrial DNA. We evaluate the performance of these packages to detect events using simulations and public benchmarking datasets, and annotate processed transcripts in a public structural variant database.

Conclusions

svaRetro and svaNUMT provide efficient, modular tools for downstream identification and annotation of structural variant calls.

Article activity feed

  1. Abstract

    This work has been published in GigaByte Journal under a CC-BY 4.0 license (https://doi.org/10.46471/gigabyte.70), and has published the reviews under the same license. These are as follows.

    Reviewer 1. Surajit Bhattacharya.

    The authors of the manuscript have tried to address a significant problem in genomics study, i.e. annotation of non-coding elements of the genomes. The authors have built two R tools, to capture two non-coding regulatory elements, Retroposed transcripts and Nuclear mitochondrial integrations(NUMT). The authors have illustrated the efficiency of the tools with examples using 2 datasets, and also benchmarked the tools using other available tools. Although the authors have performed validations, there seem to be some points that still needs to be clearly elucidated.

    Minor Points:

    1. On line 125, "BEDPE and Pairs [28]", should be written as "BEDPE [28] and pairs".
    2. Although, the authors benchmark the two tools, can they briefly compare the time taken to run the ir tools against the tools they are benchmarking with? For example, compare the time between svaRetro and GRIPper and svaNUMT and dinumt.
    3. It's not a question, but more of a comment. Is it possible to verify some of the novel variants identified by svaRetro and svaNUMT , using PCR or any other method? This can strengthen the point that svaRetro and svaNUMT, is better than the other tools.

    **Reviewer 2. Gargi Dayama **

    Is there a clear statement of need explaining what problems the software is designed to solve and who the target audience is?

    Yes. Although additional clarification on features of svaRetro can be helpful

    Is there a clearly-stated list of dependencies, and is the core functionality of the software documented to a satisfactory level?

    Yes. Additionally, it might be useful to state in description on Github, R version required to install the tool (it doesn’t work with versions older than 4.1)

    Have any claims of performance been sufficiently tested and compared to other commonly-used packages?

    No. 1) The authors also need to benchmark their tools against other previously developed tools that they used for comparison (dinumt and GRIPper) using the simulated data.

    1. Authors state they found calls that were not found by the other tool. This needs to be further tested to show the results were true positive. In fact, there is no test done to look at the false positives. Therefore, doing a test on their entire results for false positive/ true positive is essential.

    Is automated testing used or are there manual steps described so that the functionality of the software can be verified?

    Yes. But there is a discrepancy for svaNumt. The following command on github “NUMT <- svaNUMT::numtDetect(gr, numtS, genomeMT, max_ins_dist = 20)” doesn’t work. Instead this worked “NUMT <- svaNUMT::numtDetect(gr, max_ins_dist = 20)”

    Additional comments sent in an annotated file to the author.

    Re-review: I feel the authors have addressed my comments. I just have one small comment about their statement in conclusion section line 359-360. They made a statement that “svaRetro and svaNUMT demonstrated good performance on simulation and human cell line datasets similar to - or in some instances outperforming - other methods without re-analysis of alignment and the use of specialized detectors”. While this statement might be all right for simulated data, based on their results in lines 309-319 in cell lines, svaNUMT seems to almost has a 50% false positive annotation rate (although with low confidence). I feel this should be addressed as a caveat in the conclusion and a bit more clearly as false positives in results. Other than that, I do not have any additional comments.

    **Reviewer 3. Raniere Gaia Costa da Silva. **

    See the CODECHECK Certificate of independent execution https://doi.org/10.5281/zenodo.7084333

    See more in GigaBlog: http://gigasciencejournal.com/blog/frictionless-data-interactive-figures/