Uncovering biological patterns across studies through automated large-scale reanalyses of public transcriptomic data
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Large amounts of transcriptomic data have been made available in public repositories. Systematic reanalyses of these data offer the potential to identifying conserved biological patterns or context-specific signatures. However, this is a labour intensive process requiring bioinformatic expertise and a long chain of manual decision making. Use of LLMs and agentic systems holds promise for automating these otherwise time-consuming tasks.
Here, we present UORCA (Unified -Omics Reference Corpus of Analyses), a tool to systematically identify and analyse public transcriptomic datasets. UORCA uses an LLM-assisted framework to search for datasets relevant to a research question. These datasets are analysed through a multi-agent system that performs a standardised bioinformatic analyses to identify differentially expressed genes. Results of each analysis are then displayed in an interactive visual interface.
We found that UORCA recapitulated findings reported from a manual comparison of datasets, but also found biological signatures that were not initially described. We find that UORCA generates targeted hypotheses relevant for drug design, and facilitates evaluation of experimental results where they differ from past literature. Together, these findings demonstrate how UORCA accelerates biomedical discovery by enabling scientists to extract actionable findings from diverse public datasets.