Scalable data harmonization for single-cell image-based profiling with CytoTable

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

High-content imaging (HCI) involves the automated acquisition and quantitative analysis of cell phenotypes from microscopy images. HCI has become a vital tool in biomedical research, supporting studies to characterize disease mechanisms and to identify new therapeutic agents. These studies often rely on high-throughput screening, which can involve thousands of chemical or genetic perturbations that produce terabytes of microscopy data. To extract meaningful biological insights, this data must be processed into quantitative features through a technique known as image-based profiling. A major analytical bottleneck is curating the high-dimensional, single-cell data derived from image analysis tools such as CellProfiler 1,2 , DeepProfiler 3 , Revvity Harmony, and Molecular Devices IN Carta. 4 These datasets suffer from inconsistent data schemas, inefficient file formats (e.g., CSV, TSV, SQLite), and undocumented ontological relationships (integrations between data objects), which present significant challenges in data processing and standardization. These challenges reduce reproducibility and slow progress in downstream applications. To solve these issues, we introduce CytoTable 5 , a software package for harmonizing single-cell image-based profiling. CytoTable enables modular, portable, and cross-language data integration through a software stack comprising: (a) intermediate data curation harmonizing across multiple image analysis tools using Apache Arrow 6 , (b) serialized Arrow-compatible storage using Apache Parquet 7 , (c) high-performance SQL-based queries using DuckDB 8 , (d) a scalable execution engine using Parsl 9 with Pythonic MapReduce 10 design patterns, and (e) a Python-based execution runtime. By combining these technologies, CytoTable increases efficiency, accessibility, and interoperability of single-cell image-based profiling workflows. In essence, CytoTable is a robust, reproducible, and scalable engine that harmonizes single-cell readouts from multiple image analysis tools, preparing for feature integration with software in the Cytomining ecosystem such as Pycytominer. 11

Article activity feed