CAMDA 2023: finding patterns in urban microbiomes

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

The Critical Assessment of Massive Data Analysis (CAMDA) addresses the challenge of effectively utilizing Big Data in life science. Serving as both a conference and a catalyst for research groups, CAMDA annually presents challenges that foster innovative solutions. For the Forensics CAMDA 2023 challenge, we analyzed 365 metagenomic samples from 16 cities worldwide to characterize their origin. The forensic challenge was addressed from two perspectives: using the reduced abundance OTU tables and employing functional annotations. To identify the most informative Operational Taxonomic Units, we fit negative binomial models ultimately reducing variables to 294. After OTU selection, we implemented supervised models and conducted 5-fold cross-validation (CV) with a 4:1 training-to-validation ratio in each scenario. Support vector classification (SVC) achieved the highest F1 score (0.96) for the abundance tables, accurately classifying most cities, although New York City (NYC) posed a challenge. Via functional profiles with Mifaser at level 4, we achieved the best functional classification using the Neural Network (NN) model. Additionally, to gain insight into further associations between bacterial distribution with other covariates, we applied Dirichlet regression over Escherichia , Enterobacter , and Klebsiella bacteria abundances. We considered climatic and demographic variables of the cities, observing that population increase is indeed associated with a rise in the mean of Escherichia while decreasing temperature is linked to higher proportions of Klebsiella . For replicability of the scripts, a Docker container and a Conda environment are available at the repository: GitHub:github.com/ccm-bioinfo/cambda2023

Article activity feed