High-performance machine learning for peptide classification from nanopore translocation events, leveraging event kinetics and duration filtering

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Understanding single-molecule translocation dynamics through biological nanopores is fundamental to advancing next-generation biosensing and sequencing technologies. Here, using the anthrax toxin protective antigen nanopore, we describe a high-performance machine learning (ML) framework for classifying a diverse series of guest-host peptides based on individual translocation events. The approach leverages carefully engineered, event-level biophysical features extracted from either scaled current and conductance state sequences. Through systematic UMAP analysis of this feature space, we reveal that filtering away the shortest events effectively enriches the dataset with more discriminative longer events, leading to improved classification. Various deep learning (DL) and traditional ML architectures, including convolutional neural networks (CNN), temporal convolutional networks (TCN), and eXtreme Gradient Boosting (XGBoost), were investigated. The dual-input CNN-Dense model, which utilized current sequences and features, achieved strong classification performance (accuracy ∼0.80). However, the most robust classification was achieved with XGBoost acting solely on the engineered feature set, demonstrating superior performance (accuracy ∼0.90). This ML approach provided a significant computational advantage in both training and inference over DL models. Notably, these models consistently discriminated between peptides differing only in backbone stereochemistry, highlighting the exquisite sensitivity of the nanopore to subtle conformational dynamics. These findings underscore that carefully engineered event-level features, particularly from longer translocations, combined with efficient tree-based models, offer a highly effective and computationally favorable strategy for high-fidelity peptide classification for biosensing applications.

Article activity feed