sxLaep: a Lightweight and Accurate Enzyme Predictor for High-throughput Mining of Metagenomic Sequences

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Motivation

Metagenomic sequencing generates petabyte-scale sequence datasets that strain both deep learning and alignment based enzyme annotation tools. A lightweight rapid and accurate filter tool is needed to filter and identify enzymatic sequences prior to resource-intensive functional prediction.

Results

We present sxLaep (Lightweight and Accurate Enzyme Predictor), a resource-efficient framework using lightweight physicochemical features for enzyme pre-screening. On the external validation set, sxLaep completed prediction in only 0.002 s/sequence, which is 22.9-fold faster than Diamond (0.0457 s/sequence). It used 372.16 MB peak memory, corresponding to a 54.4% memory reduction relative to Diamond (815.64 MB). sxLaep achieved an accuracy of 99.34% and the highest recall in remote homology detection, including enzyme candidates missed by alignment-based methods. We further successfully applied sxLaep to a marine metagenomic enzyme-mining workflow, demonstrating its utility for high-throughput discovery from large-scale metagenomic sequences.

Availability and Implementation

sxLaep is available as a Python package at https://pypi.org/project/sxlaep and is maintained as an open-source software repository at https://github.com/labxscut/sxLaep . Detailed installation, usage, and Docker deployment instructions are provided in the GitHub repository to support reproducible enzyme prediction and model execution.

Contact

lcxia@scut.edu.cn

Article activity feed