Two dimensional sequence alignment shows that replication slippage may generate a significant proportion of all transversion substitutions

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

A new approach to DNA sequence alignment is introduced to expand the number of homology states between nucleotides. While standard gapped alignment (GA) operates under a two-state homology model of one-to-one and one-to-none (one-to-gap) relationships, a micro-paralogical gapped alignment (MPGA) approach adds one-to-many , many-to-many , and many-to-none relationships. This multi-state homology model is motivated by the DNA replication errors caused specifically by replication slippage (RS). RS produces short tandem repeats (TRs), constituting interrelated, micro-paralogous sequences. RS and TR-associated instability give rise to a major proportion of insertions and deletions, which require the insertion of gaps during multiple sequence alignment. While GA incurs the computational cost of determining optimal gap insertion, an unsolvable task with a two-state homology model, MPGA reduces the gap insertion task by reducing the overall number of gaps in 2D alignments. Two-dimensional self-alignment of a sequence occurs when tandem repeats are contracted into the same columns (dimension one) by occupying multiple rows (dimension two), an internal micro-paralogical dimension. A program called LINEUP is introduced to demonstrate the challenges and opportunities of 2D self-alignment of DNA sequences. It is then shown how 2D alignments can provide more precise measures of point mutation rates and transition-to-transversion ratios than 1D alignments. It is also shown how diversely-conserved protein-coding sequences have a distinctive signature of dinucleotide repeat depletion and trinucleotide enrichment relative to non-protein coding sequences and randomly shuffled, synthetic sequences. This trinucleotide enrichment occurs across all three reading frames. These results showcase significant new perspectives on basic mutational and evolutionary processes.

Article activity feed