Stellar quality control for single-cell image-based profiling with coSMicQC
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Over the past twenty years, high-content imaging has transformed our ability to measure cell phenotypes. The need to bioinformatically process these phenotypes led to the development of a research field called image-based profiling. However, because the standard image-based profiling approach involves averaging data, single-cell quality control (QC) has been historically ignored. The conventional approach of aggregating single cells into bulk profiles conserves computational resources and reduces, but does not completely remove, the impact of low-quality single cells. As software improves in processing scalability, researchers increasingly use single-cell image-based profiling to reveal important signals of phenotypic heterogeneity. Therefore, this evolution of image-based profiling toward single cells compels single-cell QC standards to ensure that observed morphology differences are driven by biology and not technical interference. We address these challenges with coSMicQC (Single cell Morphology Quality Control), a reproducible Python package with comprehensive tutorials that supports systematic filtering of low-quality single cells. CoSMicQC integrates seamlessly into standard image-based profiling protocols, providing an interactive, Jupyter-compatible user interface to detect technical outliers and set automatic thresholds. We applied coSMicQC to multiple real-world datasets and showed that our software improves data quality, detects mycoplasma contamination, and boosts phenotype prediction. When applied to a large-scale drug screening dataset, coSMicQC rescued lead compounds that would have been missed. Overall, coSMicQC is a reliable and scalable method that removes technical outliers to reduce noise and improve image-based profiling insights.