Predicting Cell-Penetrating Peptide Uptake Mechanism from Sequence: A Machine Learning Approach
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Cell-penetrating peptides (CPPs) are promising drug delivery vectors, but their therapeutic efficacy depends critically on their cellular uptake mechanism. CPPs can enter cells via energy-dependent endocytosis or energy-independent direct translocation, with profound implications for cargo delivery and bioavailability. While numerous computational tools predict whether a peptide has cell-penetrating properties, none predict the uptake mechanism itself. Here, we present the first machine learning model specifically designed to predict CPP uptake mechanism from amino acid sequence. We curated a dataset of 142 CPPs with experimentally validated mechanisms from peer-reviewed literature. After removing sequences with >80% identity, 111 non-redundant peptides remained. Using nested 5-fold cross-validation with bootstrap confidence intervals, our best model (SVM-RBF) achieved an AUC-ROC of 0.795 [95% CI: 0.711-0.872], accuracy of 72.1%, and MCC of 0.447. Feature importance analysis revealed that hydrophobicity, leucine content, and basic residue ratio are key predictors, consistent with known biophysical mechanisms. Our model and dataset are freely available at https://github.com/Misterbra/cpp-mechanism-predictor.