Comparing the ability of embedding methods on metabolic hypergraphs for capturing taxonomy-based features

Mattia Cervellini
Blerina Sinaimeri
Catherine Matias
Alessio Martino

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Background

Metabolic networks are complex systems that describe the biochemical reactions within an organism through pairwise interactions between chemical compounds. While this representation is widely used to study biological function, it fails to capture the full structure of metabolic reactions, many of which involve more than two compounds. Hypergraphs offer a more natural representation, where nodes represent metabolites and hyperedges represent reactions involving multiple participants. Clustering such metabolic hypergraphs can reveal systematic differences among evolutionarily distinct organisms, providing insight into ecological constraints and evolutionary pressures.

Methods

In this study, we investigate how different graphs and hypergraphs embedding methods influence their unsupervised clustering, with the goal of capturing taxonomy-based classes. We apply 14 distinct embedding strategies to a large-scale dataset of 8,467 metabolic hypergraphs. Each embedding was followed by hierarchical clustering using a fixed linkage method. To assess performance, we compared the resulting clusters against known taxonomic groupings.

Results

Our findings show that the choice of hypergraph embedding has a significant effect on clustering outcomes. Among the tested methods, Bag of Hyperedges with Jaccard distance, Histogram Cosine Kernel, and a Hypergraph Auto-Encoder consistently performed best. We also advocate that the embedding method should be chosen based on the goal of the downstream task.

Version published to 10.1101/2025.07.10.663860 on bioRxiv
Jul 15, 2025

META-DIFF: a k-mer-based pipeline that detects differentially abundant sequences in metagenomics whole genome sequencing

This article has 8 authors:
1. Louis-Maël Guéguen
2. Alban Mathieu
3. Simon Pelletier
4. Anthony Woo
5. Namita Misra
6. Magali Moreau
7. Olivier Perin
8. Arnaud Droit
This article has no evaluationsLatest version Jan 29, 2026
Scaling from Metawebs to Realised Webs: A Hierarchical Approach to Network Ecology

This article has 5 authors:
1. Tanya Strydom
2. Alexander Dunhill
3. Jennifer Dunne
4. Timothée Poisot
5. Andrew Beckerman
This article has no evaluationsLatest version Jan 21, 2026
Tuning Knowledge Graph Embeddings in Clustering with LISE

This article has 5 authors:
1. Verdiana Schena
2. Simona Colucci
3. Donini Francesco Maria
4. Floriano Scioscia
5. Eugenio Di Sciascio
This article has no evaluationsLatest version Dec 15, 2025

Discuss this preprint

Listed in

Abstract

Background

Methods

Results

Article activity feed

Related articles

META-DIFF: a k-mer-based pipeline that detects differentially abundant sequences in metagenomics whole genome sequencing

Scaling from Metawebs to Realised Webs: A Hierarchical Approach to Network Ecology

Tuning Knowledge Graph Embeddings in Clustering with LISE