Machine Learning Framework for Multi-Label Antimicrobial Peptide Classification with Interpretable Feature Insights

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Antimicrobial peptides are promising alternatives to combat antibiotic resistance, but their multifunctionality poses significant classification challenges. This study introduces a multilabel classification framework using adaptation techniques in Extreme Gradient Boosting (XGB_multi), Random Forest (RF_multi), and Convolutional Neural Networks (CNN_multi) to predict AMP functionalities, including antibacterial, antifungal, antiviral, anticancer, and mammalian cell targeting. A dataset containing \((6,845)\) peptide sequences was curated from the dbAMPv2 database. Sequences were filtered for canonical amino acids and clustered with a \((90%)\) sequence identity threshold using CD-HIT (Cluster Database at High Identity with Tolerance), ensuring diversity and reducing redundancy. Features were extracted using the iFeatureOmega toolkit, combining sequence-based descriptors like Amino Acid Composition (AAC) and Pseudo-Amino Acid Composition (PAAC) with physicochemical properties like hydrophobicity, charge, and secondary structure. XGB_multi achieved the highest accuracy of \((0.919)\), outperforming RF_multi and CNN_multi in other functional predictions. CNN_multi excelled in antibacterial, antifungal, and antiviral tasks with an AUC of \((0.892)\), while RF_multi demonstrated high precision (\((0.861)\)) and subset accuracy (\((0.689)\)). These models outperformed existing tools like AMPfun, MultiPep, iAMPpred, and AMP_scanner v2, achieving up to \((7.9%)\) improvement in AUC for certain functionalities. Feature importance analysis identified charge, hydrophobicity, and structural attributes as critical contributors, with sequence-derived features like PAAC and hydrophobicity5 emerging as highly discriminative. These findings provide a robust foundation for designing innovative antimicrobial therapeutics with multifunctional capabilities to combat drug-resistant pathogens.

Article activity feed