Ali-U-Net: A Convolutional Transformer Neural Net for Multiple Sequence Alignment of DNA Sequences. A proof of concept

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

We report a convolutional transformer neural network that is capable of aligning multiple nucleotide sequences. The neural network is based on the U-Net commonly used in image segmentation which we employ to transform unaligned sequences to aligned sequences. For alignment scenarios our Ali-U-Net neural network has been trained on, it is in most cases more accurate than programs such as MAFFT, T-Coffee, MUSCLE, and Clustal Omega, while being considerably faster than similarly accurate programs on a single CPU core. Limitations are that the neural network is still trained specifically for certain alignment problems and can perform poorly for gap distributions it has not seen before. Furthermore, the algorithm currently works with fixed-size alignment windows of 48×48 or 96×96 nucleotides. At this stage, we view our study as a proof of concept, confident that the present findings can be extended to larger alignments and more complex alignment scenarios in the near future.

Article activity feed