SummArIzeR: Simplifying cross-database enrichment result clustering and annotation via large language models
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Motivation
Enrichment analysis across multiple databases often results in a high level of redundancy due to overlapping terms, complicating the interpretation of biological data. To address this, we developed SummArIzeR, an R package to cluster and annotate enrichment results across multiple databases, enabling fast, intuitive interpretation and comparison across multiple conditions. SummArIzeR clusters enrichment results based on shared genes, calculates a pooled p-value for each cluster and facilitates the cluster annotation using large-language models. It further allows an easyly interpretable vizualisisation of the results.
Results
Compared to existing tools, SummArIzeR provides unbiased and fast cluster annotation using large language models. We demonstrate that SummArIzeR achieves clustering comparable to manual curation while offering superior grouping based on shared underlying genes.
Availability and Implementation
The SummArIzeR package is available as an open-source R package, with a comprehensive user manual provided in its GitHub repository: https://github.com/bonellilab/SummArIzeR .