Lapidary: Identifying and reporting amino acid sequences in metagenomes using sequence reads and Diamond

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Genome and metagenome comparisons rely on identifying genetic elements that differ or are in common between samples. These genetic elements can be identified by assembling sequenced reads and identifying the genetic element in the assembly, or by aligning nucleotide sequences in the reads to the nucleotide sequences of a reference genetic element. The first relies on the complete assembly of the genetic element of interest, and the second relies on a reference sequence represented in nucleotides. This is particularly challenging with metagenome data, where the genetic elements, including genes, are often fragmented because sequences are shared between different species in the metagenomic data, resulting in contig breaks in or around genetic elements. This presents a difficulty when identifying genetic elements through the first approach. A common approach with metagenomes is to map reads against reference nucleotide sequences and extract the depth and coverage from those reference sequences. However, currently no software exists to identity and report genetic elements using DNA-protein alignments in metagenomes. We have developed the software Lapidary to identify the identity, coverage, depth, and most likely sequence of amino acid sequences from both genome and metagenome read files. We tested the effectiveness of the method against simulated, genomic and metagenomic read datasets. Lapidary is more sensitive than assembly methods for metagenomic data that often have fragmented assemblies but is less sensitive when assemblies are more complete, as is the case with genomic data.

Article activity feed