TAGINE: Fast Taxonomy-based Feature Engineering for Microbiome Analysis

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Summary

TAGINE is a feature engineering algorithm that leverages the microbial taxonomic tree to optimize feature sets in microbiome data for predictive modeling. The algorithm starts with features at high taxonomic levels and iteratively splits them into lower-level clades in cases where it improves predictive accuracy, ultimately producing a feature set spanning multiple taxonomic levels. This approach aims to markedly reduce the number of features while preserving biological relevance and interpretability. We compare TAGINE’s performances to other standard and taxonomy-based feature engineering methods on several different datasets, and show that TAGINE yields more compact feature sets and is orders of magnitude faster than other methods, while maintaining predictive accuracy.

Availability and Implementation

TAGINE is freely available under the MIT license with source code available at https://github.com/borenstein-lab/tagine_fe .

Article activity feed