Lemur: A Single-Cell Foundation Model with Fine-Tuning-Free Hierarchical Cell-Type Generation for Drosophila melanogaster
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Single-cell genomics has revolutionized our understanding of cellular heterogeneity, but automating its analysis remains an open challenge. Cell-type annotation represents a critical bottleneck, particularly as datasets grow in size and complexity. While foundation models have shown promise in addressing this challenge, existing approaches require extensive fine-tuning for effective cell-type annotation. Here, we present Lemur ( L arge E xpression M odel for U nderstanding sc R NA-seq), a single-cell foundation model specifically designed for Drosophila melanogaster . Lemur achieves fine-tuning-free cell-type annotation through comprehensive pre-training on an integrated whole-organism atlas with a unified cell-type annotation schema. To leverage this unified schema, we developed a dedicated hierarchical cell-type decoder architecture. This approach enables Lemur to generate consistent cell-type predictions across multiple levels of granularity without requiring additional training on new datasets. The model demonstrates strong performance across diverse tissue types, experimental conditions, and sequencing technologies. It also achieves batch-effect correction without explicitly training for this task. This automated analysis capability positions Lemur as an effective tool for the fly research community. Beyond its immediate applications, Lemur establishes a framework for accelerating biological discovery. It enables rapid iteration between computational predictions and experimental validation in the highly controlled Drosophila melanogaster system, with potential implications for translational research in human biology, particularly in aging and neurodegenerative disease studies.