Reverse Prediction of Carbohydrate Esterase Polysaccharide Targets

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Background Carbohydrate esterases (CEs) catalyze the selective removal of ester-linked substituents from complex polysaccharides, influencing biomass bioprocessing. Predicting CE substrate specificity remains challenging due to the functional diversity and limited experimental characterization of CEs. Classic sequence-based approaches often fail to capture functional nuances across distant homologs. Results Here, we introduce a reverse prediction framework that leverages genomic context, specifically polysaccharide utilization loci (PULs), to infer natural polysaccharide targets of CE families. By integrating motif-based functional groups with large-scale co-occurrence analysis across Bacteroidota genomes, we identify substrate preferences at the clade level for 20 CE families. Subdivision of families into clades mitigated any polyspecificity observed when families were treated as a whole, highlighted unexplored regions within CE1, CE2, CE3, CE6, CE7, CE14, CE15, CE19, CE20, and partly within CE8 and CE12, and expanded functional coverage by up to 50% compared to characterized members alone. Conclusions The approach and the data reveal substantial functional subdivision of the CE families, enabling prediction of specific targets including arabinoxylan, β-mannan, pectin substructures, and glycosaminoglycans. Reverse prediction thus offers a powerful tool for guiding enzyme discovery and designing tailored enzyme cocktails for biomass valorization.

Article activity feed