Lapidary: Identifying and reporting amino acid sequences in metagenomes using sequence reads and Diamond

Samuel J Bloomfield
Aldert L Zomer
Alison E Mather

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Genome and metagenome comparisons rely on identifying genetic elements that differ or are in common between samples. These genetic elements can be identified by assembling sequenced reads and identifying the genetic element in the assembly, or by aligning nucleotide sequences in the reads to the nucleotide sequences of a reference genetic element. The first relies on the complete assembly of the genetic element of interest, and the second relies on a reference sequence represented in nucleotides. This is particularly challenging with metagenome data, where the genetic elements, including genes, are often fragmented because sequences are shared between different species in the metagenomic data, resulting in contig breaks in or around genetic elements. This presents a difficulty when identifying genetic elements through the first approach. A common approach with metagenomes is to map reads against reference nucleotide sequences and extract the depth and coverage from those reference sequences. However, currently no software exists to identity and report genetic elements using DNA-protein alignments in metagenomes. We have developed the software Lapidary to identify the identity, coverage, depth, and most likely sequence of amino acid sequences from both genome and metagenome read files. We tested the effectiveness of the method against simulated, genomic and metagenomic read datasets. Lapidary is more sensitive than assembly methods for metagenomic data that often have fragmented assemblies but is less sensitive when assemblies are more complete, as is the case with genomic data.

Version published to 10.1101/2024.03.25.586564 on bioRxiv
Mar 28, 2024

The teff (Eragrostis tef) pangenome reveals haplotypic diversity and targets for molecular improvement

This article has 23 authors:
1. Matteo Dell'Acqua
2. Worku Kebede Tekle
3. Ettore Riccucci
4. Giorgia Di Santolo
5. Leonardo Caproni
6. Sara Castelletti
7. Solomon Chanyalew
8. Carlo Fadda
9. Lea Jäggi
10. Maximillian Jones
11. Gabriele Magris
12. Davide Scaglione
13. Simone Scalabrin
14. Mario Enrico Pè
15. Zerihun Tadele
16. Aiswarya Girija
17. Alessandro Triacca
18. Cristobal Uauy
19. Robert Van Buren
20. Bernice Waweru
21. Aemiro Woldeyohannes
22. Brande Wulff
23. Oluwaseyi Shorinola
This article has no evaluationsLatest version Apr 6, 2026
pynnotate: a flexible tool for retrieving and processing GenBank data in molecular evolution research and education

This article has 4 authors:
1. Fernanda Caron
2. Felipe Magalhães
3. Matheus Salles
4. Fabricius Domingos
This article has no evaluationsLatest version Feb 26, 2026
Beyond SNPs: Scalable Detection of Structural Variants Unlocks Hidden Genetic Diversity in Tomato

This article has 1 author:
1. Reza Shekasteband
This article has no evaluationsLatest version Mar 10, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

The teff (Eragrostis tef) pangenome reveals haplotypic diversity and targets for molecular improvement

pynnotate: a flexible tool for retrieving and processing GenBank data in molecular evolution research and education

Beyond SNPs: Scalable Detection of Structural Variants Unlocks Hidden Genetic Diversity in Tomato