An Embeddings Fusion Approach Predicts Disease State from Microbiome Features

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Background

Deep neural networks are a proven technique for working with high dimensional data because of their ability to draw-out meaningful patterns and create vector representations known as “embeddings”, which make it easier to work with learning tasks on large inputs as they capture the semantics and variance of the data. Microbial community abundance profiles are well suited for an embeddings approach due to their high dimensionality, and in this work we introduce a novel approach for generating embeddings from visual representations which encode NCBI’s taxonomic tree and microbial compositions as images, enabling the creation of embeddings that capture factors such as disease status, type, and geographical location.

Results

We profiled 13,534 public human metagenomes spanning 85 studies, 24 disease types, 35 countries, and 31,756 microbial species using a profiling pipeline that indexes NCBI’s nucleotide database (nt) across all kingdoms of life. Our model achieves an average classification performance of 84% in distinguishing healthy and disease conditions; 87% for disease types, 99% for body sites, and 88% for geographical locations. It also achieves a 97% accuracy when performing multi-label classification of the four factors combined.

Conclusion

Our work highlights the use of an embeddings approach that can encode multiple features and create efficient contextualization of profiled metagenomes derived from microbiome samples. The model’s embeddings can be used to cluster existing samples based on multiple conditions and interpretations, and new embeddings can be quickly created for new samples and fitted to existing clusters to characterize them. This has practical applications for unknown, unlabeled microbiome samples.

Article activity feed