Algorithms to reconstruct past indels: the deletion-only parsimony problem

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Ancestral sequence reconstruction is an important task in bioinformatics, with applications ranging from protein engineering to the study of genome evolution. When sequences can only undergo substitutions, optimal reconstructions can be efficiently computed using well-known algorithms. However, accounting for indels in ancestral reconstructions is much harder. First, for biologically-relevant problem formulations, no polynomial-time exact algorithms are available. Second, multiple reconstructions are often equally parsimonious or likely, making it crucial to correctly display uncertainty in the results.

Here, we consider a parsimony approach where any indel event has the same cost, irrespective of its size or the branch where it occurs. We thoroughly examine the case where only deletions are allowed, while addressing the aforementioned limitations. First, we describe an exact algorithm to obtain all the optimal solutions. The algorithm runs in polynomial time if only one solution is sought. Second, we show that all possible optimal reconstructions for a fixed node can be represented using a graph computable in polynomial time. While previous studies have proposed graph-based representations of ancestral reconstructions, this result is the first to offer a solid mathematical justification for this approach. Finally we discuss the relevance of the deletion-only case for the general case.

Author summary

An exciting frontier in evolutionary biology is the ability to reconstruct DNA or protein sequences from species that lived in the distant past. By analyzing sequences from present-day species, we aim to infer the sequences of their common ancestors —a process known as ancestral sequence reconstruction. This task has far-reaching applications, such as resurrecting ancient proteins and studying the biology of extinct organisms. However, a significant challenge remains: the lack of well-established methods for inferring past deletions and insertions —–mutations that remove or add segments of genetic code. In this paper, we present algorithms that lay the groundwork for addressing this gap. We show that finding the reconstructions involving only deletion events, while minimizing their number, can be done efficiently. Additionally, we show that all optimal solutions can be represented using specialized graphs. While previous studies have proposed graph-based representations of ancestral reconstructions, we are the first to provide a rigorous mathematical foundation for the use of these graphs.

Article activity feed