scGenAI: A generative AI platform with biological context embedding of multimodal features enhances single cell state classification

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Summary

Single-cell sequencing has advanced the understanding of cellular heterogeneity, yet traditional cell type annotation tools struggle with increasingly complex datasets and multimodal integration. Recent large language models (LLMs) offer improved accuracy but rely on pre-trained models with limited gene vocabularies and lack of biological contextualization. These constraints make it challenging to effectively fine-tune pre-trained models, especially for novel datasets, non-human species, or disease-specific studies where unique gene expression patterns are critical. To address these constraints, we developed scGenAI, an LLM-based tool that supports straightforward and flexible de novo training, allowing researchers to incorporate genomic and biofunctional contexts to improve accuracy and interpretability on the prediction of cell states. Here, we demonstrate that scGenAI outperforms conventional models in tasks such as cell type prediction and acute myeloid leukemia (AML) malignant cell states identification, making it a powerful tool for single-cell analysis workflows.

Availability and implementation

scGenAI (DOI: 10.5281/zenodo.14927611 ) is distributed as an open-source Python package, with source code and comprehensive documentation accessible on GitHub at https://github.com/VOR-Quantitative-Biology/scGenAI .

Article activity feed