Deciphering enzymatic potential in metagenomic reads through DNA language models

R Prabakaran
Yana Bromberg

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Microbial communities drive essential global processes, yet much of their functional potential remains unexplored. Metagenomics stands to elucidate this microbial “dark matter” by directly sequencing the microbial community DNA from environmental samples. However, the exploration of metagenomic sequences is mostly limited to establishing their similarity to curated reference sequences. A paradigm shift—language model (LM)-based methods—offers promising avenues for reference-free analysis of metagenomic reads. Here, we introduce two LMs, a pretrained foundation model REMME (Read EMbedder for Metagenomic Exploration), aimed at understanding the DNA context of metagenomic reads, and the fine-tuned REBEAN (Read Embedding-Based Enzyme ANnotator) for predicting the enzymatic potential encoded within the read-corresponding genes. By emphasizing function recognition over gene identification, REBEAN labels gene-encoded molecular functions of previously explored and new (orphan) sequences. Even though it was not trained to do so, REBEAN identifies the gene’s function-relevant parts. It thus expands enzymatic annotation of unassembled metagenomic reads. Here, we present novel enzymes discovered using our models, highlighting model impact on our understanding of microbial communities.

Version published to 10.1093/nar/gkaf836
Aug 27, 2025
Version published to 10.1101/2024.12.10.627786 on bioRxiv
Dec 11, 2024

Shotgun metagenomics: a deep insight into the composition and function of the complex microbial world

This article has 7 authors:
1. Grazia Visci
2. Elisabetta Notario
3. Giuseppe Defazio
4. Mariano Francesco Caratozzolo
5. Bruno Fosso
6. Marinella Marzano
7. Graziano Pesole
This article has no evaluationsLatest version Jan 30, 2026
Horizontal Gene Transfer Between Fungi and Myxozoa: An Evolutionary Perspective

This article has 2 authors:
1. Amr G. A. Ibrahim
2. Edson A. Adriano
This article has no evaluationsLatest version Mar 17, 2026
A Pan-Biome Metagenomic Atlas of the Brazilian Rhizosphere, Root, and Soil Microbiomes

This article has 20 authors:
1. Luisa Mayumi Arake de Tacca
2. Rayane Nunes Lima
3. Patrícia Verdugo Pascoal
4. Deborah Bambil
5. Marco Antônio de Oliveira
6. Ahmed Moustafa
7. Helena Ipê Pinheiro Guimarães
8. Carlos Alexandre Azevedo
9. Paula Mizote
10. Grácia Maria Soares Rosinha
11. Daniela Carvalho Bittencourt
12. Diana Signor
13. Magna Soelma Beserra de Moura
14. José Pedro Pereira Trindade
15. Leandro Bochi da Silva Volk
16. Fernando Antônio Fernandes
17. Ricardo Lopes
18. Jean Luiz Simões-Araújo
19. Marcelo Freire
20. Elibio Rech
This article has no evaluationsLatest version Mar 9, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Shotgun metagenomics: a deep insight into the composition and function of the complex microbial world

Horizontal Gene Transfer Between Fungi and Myxozoa: An Evolutionary Perspective

A Pan-Biome Metagenomic Atlas of the Brazilian Rhizosphere, Root, and Soil Microbiomes