scMarkerAgent: An LLM Evidence Agent-based Cell Marker Atlas
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Evidence-augmented and reliable cell-type annotation remains a major bottleneck in single-cell RNA-seq analysis, particularly for rare, transitional, and disease-associated populations. To address this, we introduce scMarkerAgent, an evidence-grounded cell marker resource developed using an LLM-assisted literature-curation framework. It integrates 294,692 full-text publications to provide 890,296 high-quality cell type–marker annotations from 50,233 cell types across human, mouse, and rat. scMarkerAgent integrates 82,165 curated negative-marker annotations and 417,812 disease-context annotations, improving disambiguation of homologous cell types and delineation of malignant cells. Every cell type–marker annotation is directly supported by sentence-level literature evidence. In the cell annotation workflow, candidate labels are further refined through an LLM-based reasoning step that jointly evaluates positive and negative markers. Compared with existing resources, scMarkerAgent offers broader coverage of markers, tissues, cell types, and diseases. It is released as a FAIR-compliant database together with a code-free web platform that supports marker retrieval, automated cell annotation, and customizable cell scoring (available at https://www.markeragent.net).