ProtDAT: A Unified Multimodal Cross-Attention Framework for Ab-Initio Amino Acid Sequence Design from Any Protein Text Description
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Protein design has become a critical method in advancing significant potential for various applications such as drug development and enzyme engineering. However, protein design methods utilizing large language models with solely pretraining and fine-tuning struggle to capture relationships in multi-modal protein data. To address this, we propose ProtDAT, a de novo fine-grained framework capable of designing proteins from any descriptive protein text input. ProtDAT builds upon the inherent characteristics of protein data to unify sequences and text as a cohesive whole rather than separate entities. It leverages an innovative multi-modal cross-attention, integrating protein sequences and textual information for a foundational level and seamless integration. Evaluation metrics such as pLDDT, TM-score and RMSD are implemented to evaluate the rationality, functionality, structural similarity, and validity of protein sequences. Experiments on 20,000 text-sequence pairs from Swiss-Prot within the ProtDAT framework demonstrate significant improvements compared to the performance of the best method in the experiments, with a 9.84% increase in pLDDT, a 76.45% increase in TM-score, and a 24.41% reduction in RMSD.