sxLaep: a Lightweight and Accurate Enzyme Predictor for High-throughput Mining of Metagenomic Sequences
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Motivation
Metagenomic sequencing generates petabyte-scale sequence datasets that strain both deep learning and alignment based enzyme annotation tools. A lightweight rapid and accurate filter tool is needed to filter and identify enzymatic sequences prior to resource-intensive functional prediction.
Results
We present sxLaep (Lightweight and Accurate Enzyme Predictor), a resource-efficient framework using lightweight physicochemical features for enzyme pre-screening. On the external validation set, sxLaep completed prediction in only 0.002 s/sequence, which is 22.9-fold faster than Diamond (0.0457 s/sequence). It used 372.16 MB peak memory, corresponding to a 54.4% memory reduction relative to Diamond (815.64 MB). sxLaep achieved an accuracy of 99.34% and the highest recall in remote homology detection, including enzyme candidates missed by alignment-based methods. We further successfully applied sxLaep to a marine metagenomic enzyme-mining workflow, demonstrating its utility for high-throughput discovery from large-scale metagenomic sequences.
Availability and Implementation
sxLaep is available as a Python package at https://pypi.org/project/sxlaep and is maintained as an open-source software repository at https://github.com/labxscut/sxLaep . Detailed installation, usage, and Docker deployment instructions are provided in the GitHub repository to support reproducible enzyme prediction and model execution.
Contact
lcxia@scut.edu.cn