2D representations of DNA sequence show that most transversions are misaligned nucleotides associated with replication slippage
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Homologous sequences diverge in length via insertions and deletions (indels). Consequently, evolutionary genetic analyses routinely use methods to produce gapped alignment (GA). In GA, artificial null characters (gaps) are inserted into sequences so that nucleotide characters may be placed into homological correspondence within an alignment column. However, this approach sacrifices the homological correspondence of nucleotides diverging via tandem repeats (TRs). To address this deficit, we generalize GA with micro-paralogical gapped alignment (MPGA). While GA operates under a strict two-state homology model of one-to-one and one-to-none (i.e. one-to-gap) relationships, MPGA adds one-to-many , many-to-many , and many-to-none relationships. This expanded, multi-state homology model is motivated by DNA replication slippage (RS). RS produces short tandem repeats, constituting interrelated micro-paralogous sequences. Together, RS and TR-associated instability have a synergistic effect in the production of indels, which generate the need for gap insertions. MPGA reduces the computational cost of determining optimal gap insertions by reducing the number of gaps required by two-dimensional (2D) representations of sequence. A 2D representation of one sequence is achieved when tandem repeats are contracted into the same columns (dimension one) by occupying multiple rows (dimension two), an internal micro-paralogical dimension. To demonstrate the benefits and challenges of 2D representation, we develop a program called LINEUP and identify a pervasive fractal dimension in evolving sequences. We then demonstrate how LINEUP -generated 2D representations provide improved measures of substitution rates and transition-to-transversion ratios. Altogether, these results showcase significant new perspectives on basic mutational and evolutionary processes when multi-state homology models are adopted.