svaRetro and svaNUMT: modular packages for annotating retrotransposed transcripts and nuclear integration of mitochondrial DNA in genome sequencing data

Abstract

Nuclear integration of mitochondrial genomes and retrocopied transcript insertion are biologically important but often-overlooked aspects of structural variant (SV) annotation. While tools for their detection exist, these typically rely on reanalysis of primary data using specialised detectors rather than leveraging calls from general purpose structural variant callers. Such reanalysis potentially leads to additional computational expense and does not take advantage of advances in general purpose structural variant calling. Here, we present svaRetro and svaNUMT; R packages that provide functions for annotating novel genomic events, such as nonreference retrocopied transcripts and nuclear integration of mitochondrial DNA. The packages were developed to work within the Bioconductor framework. We evaluate the performance of these packages to detect events using simulations and public benchmarking datasets, and annotate processed transcripts in a public structural variant database. svaRetro and svaNUMT provide modular, SV-caller agnostic tools for downstream annotation of structural variant calls.

Abstract

This work has been published in GigaByte Journal under a CC-BY 4.0 license (https://doi.org/10.46471/gigabyte.70), and has published the reviews under the same license. These are as follows.

Reviewer 1. Surajit Bhattacharya.

The authors of the manuscript have tried to address a significant problem in genomics study, i.e. annotation of non-coding elements of the genomes. The authors have built two R tools, to capture two non-coding regulatory elements, Retroposed transcripts and Nuclear mitochondrial integrations(NUMT). The authors have illustrated the efficiency of the tools with examples using 2 datasets, and also benchmarked the tools using other available tools. Although the authors have performed validations, there seem to be some points that still needs to be clearly elucidated.

Minor Points:

On line 125, "BEDPE and Pairs [28]", should be written as "BEDPE [28] and pairs".
Although, the authors benchmark the two tools, can they briefly compare the time taken to run the ir tools against the tools they are benchmarking with? For example, compare the time between svaRetro and GRIPper and svaNUMT and dinumt.
It's not a question, but more of a comment. Is it possible to verify some of the novel variants identified by svaRetro and svaNUMT , using PCR or any other method? This can strengthen the point that svaRetro and svaNUMT, is better than the other tools.

**Reviewer 2. Gargi Dayama **

Is there a clear statement of need explaining what problems the software is designed to solve and who the target audience is?

Yes. Although additional clarification on features of svaRetro can be helpful

Is there a clearly-stated list of dependencies, and is the core functionality of the software documented to a satisfactory level?

Yes. Additionally, it might be useful to state in description on Github, R version required to install the tool (it doesn’t work with versions older than 4.1)

Have any claims of performance been sufficiently tested and compared to other commonly-used packages?

No. 1) The authors also need to benchmark their tools against other previously developed tools that they used for comparison (dinumt and GRIPper) using the simulated data.

Authors state they found calls that were not found by the other tool. This needs to be further tested to show the results were true positive. In fact, there is no test done to look at the false positives. Therefore, doing a test on their entire results for false positive/ true positive is essential.

Is automated testing used or are there manual steps described so that the functionality of the software can be verified?

Yes. But there is a discrepancy for svaNumt. The following command on github “NUMT <- svaNUMT::numtDetect(gr, numtS, genomeMT, max_ins_dist = 20)” doesn’t work. Instead this worked “NUMT <- svaNUMT::numtDetect(gr, max_ins_dist = 20)”

Additional comments sent in an annotated file to the author.

Re-review: I feel the authors have addressed my comments. I just have one small comment about their statement in conclusion section line 359-360. They made a statement that “svaRetro and svaNUMT demonstrated good performance on simulation and human cell line datasets similar to - or in some instances outperforming - other methods without re-analysis of alignment and the use of specialized detectors”. While this statement might be all right for simulated data, based on their results in lines 309-319 in cell lines, svaNUMT seems to almost has a 50% false positive annotation rate (although with low confidence). I feel this should be addressed as a caveat in the conclusion and a bit more clearly as false positives in results. Other than that, I do not have any additional comments.

**Reviewer 3. Raniere Gaia Costa da Silva. **

See the CODECHECK Certificate of independent execution https://doi.org/10.5281/zenodo.7084333

See more in GigaBlog: http://gigasciencejournal.com/blog/frictionless-data-interactive-figures/

Read the original source

svaRetro and svaNUMT: modular packages for annotating retrotransposed transcripts and nuclear integration of mitochondrial DNA in genome sequencing data

This article has been Reviewed by the following groups

Listed in

Abstract

Article activity feed

A global map for introgressed structural variation and selection in humans

nf-core/viralmetagenome: A Novel Pipeline for Untargeted Viral Genome Reconstruction

ATaRVa: Analysis of Tandem Repeat Variation from Long Read Sequencing data

This article has been Reviewed by the following groups

Listed in

Abstract

Article activity feed

Related articles

A global map for introgressed structural variation and selection in humans

nf-core/viralmetagenome: A Novel Pipeline for Untargeted Viral Genome Reconstruction

ATaRVa: Analysis of Tandem Repeat Variation from Long Read Sequencing data