Emerging Use of Agentic AI Systems Across Genomics and Transcriptomics Domains: a Systematic Review
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Background Genomics and transcriptomics workflows require coordinated use of multiple specialized tools, creating technical barriers for many domain scientists. Large language models (LLMs) have shown promise for supporting bioinformatics tasks, but standalone models lack persistent state, autonomous tool use, and reliable multi-step execution. Agentic AI systems, which extend LLMs with planning, tool integration, and iterative execution, may address these limitations, yet systematic evidence of their application in genomics and transcriptomics remains limited. Methods We conducted a systematic review following PRISMA guidelines (PROSPERO: CRD420261292811), searching PubMed, Embase, and Web of Science. Eligible studies included LLM-based agentic systems defined by autonomous multi-step planning, iterative decision-making, and external tool or workflow invocation applied to genomics or transcriptomics tasks with formal performance evaluation. Two independent reviewers (I.R., A.G.) screened 2,932 unique records. Results Ten studies (2024–2026) met inclusion criteria, covering applications such as single-cell RNA-seq annotation, CRISPR guide design, Mendelian randomization, biomarker discovery, and automated bioinformatics workflows. Six systems used single-agent and four multi-agent architectures, all integrating external tools via code execution, retrieval-augmented generation, or domain-specific APIs. GPT-4–family models were the most common backbone (n = 8). Quantitative evaluations (n = 6) reported performance gains of + 1.4 to + 80 percentage points over baseline LLMs or expert comparators, while qualitative assessments (n = 4) showed high agreement with expert benchmarks. Three architectural patterns emerged: multi-agent designs were associated with tasks involving interpretive uncertainty; structural constraints reduced backbone sensitivity more effectively than model upgrades; and excessive iterative self-revision produced diminishing returns. Risk of bias was high in 80% of studies, primarily due to small datasets, lack of external validation, and subjective reference standards. Conclusions Agentic AI systems represent a shift from passive text generation to autonomous analytical orchestration. Be it as it may, the evidence base remains small and methodologically limited, with no system validated outside its originating research group. Future development should prioritize rigorous external benchmarking on real-world datasets, modular and explainable architectures, and coverage of underrepresented domains including variant interpretation and spatial transcriptomics.