CPP2Vec: a Representation Learning Approach for Cell-Penetrating Peptides Prediction
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Background
Cell-penetrating peptides (CPPs) facilitate the delivery of a variety of therapeutic molecules across the plasma membrane, from small chemical substances to nucleic acid-based macromolecules, such as antisense oligonucleotides (ASOs). Among neutral ASOs, peptide nucleic acids (PNAs) and phosphorodiamidate morpholino oligomers (PMOs) have been extensively studied as potential medical treatments for Duchenne Muscular Dystrophy (DMD), a severe genetic disease that causes muscle degeneration progressively. Over the last few decades, many in silico methods have emerged to detect novel CPPs, counterbalancing the cost of wet-lab experiments.
Results
In this study, we propose CPP2Vec, a Word2Vec-based CPP prediction method, where the Word2Vec technique is used to represent amino acid sequences of peptides. We developed three task-specific supervised machine learning models for CPP-Classification, Uptake-Efficiency and PMO-Delivery. The first two models were designed to determine if an unseen peptide is a CPP and to pre-dict its uptake efficiency, respectively, while the PMO-Delivery model predicts if a peptide could enhance the cellular delivery of a PMO-complex compared to its naked version. Furthermore, we explored an alternative approach using pre-trained protein-based Large Language Models (LLMs) - T5, BERT, and ESM-2 - to generate the embeddings, resulting in three task-specific models, namely CPP2LLM. A comparison of CPP2Vec and CPP2LLM with state-of-the-art CPP prediction tools is included, proving their significant predictive performance.
Conclusion
In this research, we present a Machine Learning (ML)-based tool that introduces the use of the Word2Vec technique in the field of CPPs pre-diction. Notably, it stands out for not requiring any manual a priori feature engineering and for its ability to generalize without any changes between studied tasks. CPP2Vec is available for use at: https://github.com/SSvolou/CPP2Vec .