CS-Fold: Advancing RNA Structure Predictions through Phylogenetic Modelling of Compensatory Mutations in Deep Neural Networks
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Accurate prediction of RNA secondary structures is essential for understanding the conformation, function, and interactions of RNA. Leveraging co-evolutionary information across species through multiple sequence alignments (MSAs) has been proven to be effective in improving molecular structure predictions. However, existing deep learning approaches do not explicitly incorporate compensatory substitutions along the phylogenetic trees, which are crucial for capturing structural conservation through evolution. To address this, we developed CS-Fold , a novel deep learning framework that integrates compensatory substitutions likelihoods as constraints using likelihood estimation and Monte Carlo algorithms. These likelihoods, representing evolutionary changes along phylogenetic trees, are encoded in a sparse matrix that guides the attention mechanism of a Pairformer-based architecture. A custom loss function and an unrolled post-processing algorithm enforce adherence to the solution space constrained by these evolutionary constraints. CS-Fold achieves a substantial 5% improvement in the F1 score compared to the current mainstream approaches, demonstrated through evaluations on cross-family datasets, including 604 human RNA families from the Rfam database. Our model offers novel insights and incorporates additional evolutionary information beyond traditional MSAs and folding strategies, providing a robust and innovative solution for RNA secondary structure prediction.