A data-driven rediscovery of the specificity-conferring code of adenylation domains in nonribosomal peptide synthetases

Zhengjian Li
Kenan A.J. Bozhüyük
Olga V Kalinina
Dietrich Klakow

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Nonribosomal peptide synthetases (NRPSs) are large modular enzymes that assemble structurally diverse peptides, many of pharmacological importance, including antibiotics and immunosuppressants. Within each NRPS module, the adenylation (A) domain selects the substrate to be incorporated, a choice governed by a small set of residues lining the binding pocket. For two decades, computational prediction of A-domain substrate specificity has relied on residue sets—most prominently the Stachelhaus code and the 34-residue “8 Å code”—that were defined by spatial proximity to the substrate rather than by demonstrated predictive value. Here we revisit which residues govern substrate specificity from a purely data-driven perspective. We assembled a non-redundant dataset of 5,366 A-domain sequences (4,693 bacterial and 673 fungal) and used information-theoretic measures to rank alignment positions by their statistical association with substrate identity, without restricting candidate positions to any predefined structural shell. This procedure yielded two compact, kingdom-specific codes: IG15B (15 positions) for bacterial and IG13F (13 positions) for fungal A-domains. Both match or exceed the predictive accuracy of the 34-residue 8 Å code while using fewer than half its positions, and both independently recover the majority of the classical Stachelhaus positions. Notably, our analysis identifies four positions (242, 280, 281, and 284) that lie outside all conventional codes yet carry non-redundant specificity information and co-localize with classical determinants on two helices flanking the binding pocket. These positions provide new candidate sites for the rational engineering of A-domain specificity.

Author summary

Many clinically important drugs—including antibiotics such as vancomycin and immunosuppressants such as cyclosporin—are nonribosomal peptides, assembled by large enzymes known as nonribosomal peptide synthetases. These enzymes contain adenylation domains that act as molecular gatekeepers, each selecting one chemical building block to add to a growing peptide. Identifying which amino acids within a domain determine this choice is central both to predicting what an enzyme produces and to re-engineering it to make new compounds. For over twenty years, researchers have approached this question by selecting the amino acids that sit physically closest to the substrate. However, being close to the substrate does not guarantee that a residue actually influences substrate selection. In this work, we instead let the data decide: using thousands of adenylation domain sequences, we measured which positions are statistically most informative about the substrate, using information gain, mutual information and χ ² statistic. We found that far fewer positions than conventionally used are sufficient to predict specificity, and—importantly—we identified several influential positions that earlier approaches had overlooked because they lie just beyond the conventional distance cutoff. These positions offer promising new targets for engineering these enzymes to produce novel peptide-based drugs.

Version published to 10.64898/2026.06.15.732251 on bioRxiv
Jun 18, 2026

Systematic prediction and functional analysis of amino acid residues determining product specificity in the plant oxidosqualene cyclase superfamily

This article has 9 authors:
1. Rashmi Kumari
2. Neeladri Sen
3. Rebecca Casson
4. Charlotte Owen
5. Michael J Stephenson
6. Neera Borkakoti
7. Christine Orengo
8. Janet M. Thornton
9. Anne Osbourn
This article has no evaluationsLatest version May 27, 2026
PlantP450Dock: an Automated Molecular Docking Pipeline of Plant Cytochrome P450s

This article has 5 authors:
1. Liang Feng
2. Changbin Niu
3. Xindong Qing
4. Chunhui Zhang
5. Changsheng Li
This article has no evaluationsLatest version May 15, 2026
Substrate induced activation in the conserved ribonuclease YicC

This article has 4 authors:
1. Kai Katsuya-Gaviria
2. Giulia Paris
3. Ben F. Luisi
4. Aleksei Lulla
This article has no evaluationsLatest version May 21, 2026

Discuss this preprint

Listed in

Abstract

Author summary

Article activity feed

Related articles

Systematic prediction and functional analysis of amino acid residues determining product specificity in the plant oxidosqualene cyclase superfamily

PlantP450Dock: an Automated Molecular Docking Pipeline of Plant Cytochrome P450s

Substrate induced activation in the conserved ribonuclease YicC