DANCE 2.0: Transforming single-cell analysis from black box to transparent workflow
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Preprocessing is a critical step in single-cell data analysis, yet current practices remain largely a black-box, trial-and-error process driven by user intuition, legacy defaults, and ad hoc heuristics. The optimal combination of steps such as normalization, gene selection, and dimensionality reduction varies across tasks, model architectures, and dataset characteristics, hindering reproducibility and method development. We present DANCE 2.0, an automated and interpretable preprocessing platform featuring two key modules: the Method-Aware Preprocessing (MAP) module, which discovers optimal pipelines for task-specific methods via hierarchical search, and the Dataset-Aware Preprocessing (DAP) module, which recommends pipelines for new datasets via similarity-based matching to a reference atlas. Together, MAP and DAP execute over 325,000 pipeline searches across six major tasks – clustering, cell type annotation, imputation, joint embedding, spatial domain identification, and cell type deconvolution – yielding robust and generalizable recommendations. MAP-recommended pipelines consistently outperform original method defaults, with substantial gains across all tasks. Beyond automation, DANCE 2.0 reveals interpretable preprocessing patterns across tasks, methods, and datasets, transforming preprocessing into a transparent, data-driven process. All resources are openly available at https://github.com/OmicsML/dance to support broad community adoption and future methodological advances.