Metalog: curated and harmonised contextual data for global metagenomics samples
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Metagenomic sequencing enables the in-depth study of microbes and their functions in humans, animals and the environment. While sequencing data is deposited in public databases, the associated contextual data is often not complete and needs to be retrieved from primary publications. This lack of access to sample-level metadata like clinical data or in situ observations impedes cross-study comparisons and meta-analyses. We therefore created the Metalog database, a repository of manually curated metadata for metagenomics samples across the globe. It contains 73,082 samples from humans (including 58,506 of the gut microbiome), 10,703 animal samples, 5,146 ocean water samples, and 21,802 samples from other environmental habitats such as soil, sediment, or fresh water. Samples have been consistently annotated for a set of habitat-specific core features, such as demographics, disease status and medication for humans, host species and captivity status for animals, and filter sizes and salinity for marine samples. Additionally, all original metadata is provided in tabular form, simplifying focused studies e.g. into nutrient concentrations. Pre-computed taxonomic profiles facilitate rapid data exploration, while links to the SPIRE database enable genome-based analyses. The database is freely available for browsing and download at https://metalog.embl.de/ .