AnchoRNA: Full virus genome alignments through conserved anchor regions

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Multiple sequence alignment of full viral genomes can be challenging due to factors such as long sequences, large insertions/deletions (spanning several 100 nucleotides), large number of sequences, sequence divergence, and high computational complexity in particular when computing alignments based on RNA secondary structures. Standard alignment methods often face these issues, in particular when processing highly variable sequences or when specific phylogenetic analysis is required on selected subsequences.

We present an algorithm to determine high quality anchors that define partitions of sequences and guide the alignment of viral genomes to respect well conserved, and therefore functionally significant, regions. This new approach is implemented in the Python-based command line tool AnchoRNA , which is designed to identify conserved regions, or anchors, within coding sequences. By default, anchors are searched in translated coding sequences accounting for high mutation rates in viral genomes. AnchoRNA enhances the accuracy and efficiency of full-genome alignment by focusing on these crucial conserved regions. AnchoRNA guided alignments are systematically compared to the results of 3 alignment programs. Utilizing a dataset of 55 representative Pestivirus genomes, AnchoRNA identified 55 anchors that are used for guiding the alignment process. The incorporation of these anchors led to improvements across tested alignment tools, highlighting the effectiveness of AnchoRNA in enhancing alignment quality, especially in viral genomes.

Article activity feed