Probabilistic Multiple Sequence Alignment using Spatial Transformations

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Multiple Sequence Alignment (MSA) has long been a prominent and critical tool in bioinformatics and computational biology. Its importance lies in its ability to provide valuable insights into the relationships between sequences and the evolutionary pressure leading to amino acid preferences at particular sites in a protein. Despite the recent advances in protein language models, MSAs remain critical in many applications, e.g. for state-of-the-art prediction of 3D structure and protein variant effects. Sequence alignment is typically considered a deterministic preprocessing step, leading to a single static MSA. Especially for low-similarity sequences, parts of an alignment will be subject to substantial uncertainty, which is disregarded when processing a static MSA. Earlier, HMM-based approaches handled this uncertainty by considering the full posterior ensemble over alignments. In this paper, we explore whether a similar approach is feasible within a modern deep learning approach, where we move beyond the Markovian restrictions of earlier models. In particular, we consider whether we can learn the alignment process as distribution over spatial transformations, in combination with a deep latent variable model of protein sequences. A proof-of-concept implementation of this work is available at https://github.com/deltadedirac/Explicit_Disentanglement_Molecules .

Article activity feed