SummArIzeR: Simplifying cross-database enrichment result clustering and annotation via large language models

Marie Brinkmann
Michael Bonelli
Anela Tosevska

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Motivation

Enrichment analysis across multiple databases often results in a high level of redundancy due to overlapping terms, complicating the interpretation of biological data. To address this, we developed SummArIzeR, an R package to cluster and annotate enrichment results across multiple databases, enabling fast, intuitive interpretation and comparison across multiple conditions. SummArIzeR clusters enrichment results based on shared genes, calculates a pooled p-value for each cluster and facilitates the cluster annotation using large-language models. It further allows an easyly interpretable vizualisisation of the results.

Results

Compared to existing tools, SummArIzeR provides unbiased and fast cluster annotation using large language models. We demonstrate that SummArIzeR achieves clustering comparable to manual curation while offering superior grouping based on shared underlying genes.

Availability and Implementation

The SummArIzeR package is available as an open-source R package, with a comprehensive user manual provided in its GitHub repository: https://github.com/bonellilab/SummArIzeR .

Version published to 10.1101/2025.05.28.656331 on bioRxiv
Jun 1, 2025

Tuning Knowledge Graph Embeddings in Clustering with LISE

This article has 5 authors:
1. Verdiana Schena
2. Simona Colucci
3. Donini Francesco Maria
4. Floriano Scioscia
5. Eugenio Di Sciascio
This article has no evaluationsLatest version Dec 15, 2025
META-DIFF: a k-mer-based pipeline that detects differentially abundant sequences in metagenomics whole genome sequencing

This article has 8 authors:
1. Louis-Maël Guéguen
2. Alban Mathieu
3. Simon Pelletier
4. Anthony Woo
5. Namita Misra
6. Magali Moreau
7. Olivier Perin
8. Arnaud Droit
This article has no evaluationsLatest version Jan 29, 2026
Geranium: Multimodal Retrieval of Genomics Data Visualizations

This article has 6 authors:
1. Huyen N. Nguyen
2. Sehi L'Yi
3. Thomas Chris Smits
4. Shanghua Gao
5. Marinka Zitnik
6. Nils Gehlenborg
This article has no evaluationsLatest version Dec 27, 2025

Discuss this preprint

Listed in

Abstract

Motivation

Results

Availability and Implementation

Article activity feed

Related articles

Tuning Knowledge Graph Embeddings in Clustering with LISE

META-DIFF: a k-mer-based pipeline that detects differentially abundant sequences in metagenomics whole genome sequencing

Geranium: Multimodal Retrieval of Genomics Data Visualizations