Cellquant: a vibecoder’s guide to image analysis

This article has been Reviewed by the following groups

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

Log in to save this article

Abstract

Quantitative fluorescence microscopy is central to modern cell biology, yet extracting reproducible measurements from images remains a bottleneck for biologists without programming experience. Here we present cellquant , a single-script command line pipeline for multi-channel fluorescence images that performs cell segmentation, puncta quantification, colocalization analysis, and spatial proximity measurements. Because the interface is entirely text based, the exact command used to generate any result can be recorded and re-executed. We validate cellquant on two biological systems. In human HCT116 cells, the pipeline quantified arsenite-induced stress granule formation. In budding yeast, simultaneous measurement of nucleolar morphology, colocalization, and spatial proximity across a temperature gradient revealed a coordinated sequence of nucleolar reorganization. Applying PCA and UMAP to the multi-parameter output of cellquant resolved a continuous cell state transition across the temperature gradient, with condensate redistribution and nucleolar morphology defining orthogonal axes. The pipeline produces publication-ready quantification with visual quality control and statistically rigorous replicate analysis. All code, documentation, and example datasets are freely available.

Article activity feed

  1. The cellquant pipeline is distributed as a single Python script alongside an environment specification (environment.yml)

    I'm not sure there is any benefit to organizing this software as a single script rather than as a modern python package with a modular structure and a single CLI entrypoint. Single large scripts (this one is 2300 lines) are hard to work with for both humans and agents (they are hard to navigate for humans, and it is impossible for two agents to concurrently work on unrelated parts of the code, as they would be trying to edit the same file).

  2. A complete command line interface (CLI) reference (CLI_REFERENCE.md) documents every argument with its default value, valid range, and interaction with cell-type presets. This reference is structured to be readable by both humans and chatbots, so that a user can paste it into a conversation and ask the AI to find the relevant parameter.

    This kind of documentation is a double-edged sword, as it creates two sources of truth and is hard to keep in sync with the code (this is true for both humans and agents, who each tend to forget to update the docs). I would suggest eliminating this document in favor of writing and maintaining thorough docstrings in the code itself, which humans can introspect in their IDE and which agents will "know" to read without any prompting.

  3. Distributing the pipeline as a single Python script within a minimal repository eliminates the installation complexity that often derails non-programmers before they reach the analysis itself.

    In my experience, the installation complexity referenced here is usually due to creating and managing virtual environments, and distributing this software as a single script in a github repo will not eliminate that (indeed it arguably makes it worse). I would suggest that structuring this script as a python package and distributing it via the usual channels (PyPI and conda-forge) would likely facilitate, rather than hinder, its adoption.

  4. This is not a claim that AI can replace computational biologists. Complex pipelines, novel algorithms, and performance-critical applications will continue to require programming expertise. Rather, we argue that for standard analyses (segmentation, colocalization, puncta counting, spatial measurements), the combination of a well-designed CLI tool and an AI assistant is sufficient for a biologist to produce rigorous, reproducible quantification. The biologist’s expertise in recognizing correct segmentation and evaluating biological plausibility is essential for the AI’s ability to translate natural-language descriptions into parameter configurations.

    This is an important (and appreciated!) callout. Have you thought about the types of test questions a biologist should ask to check AI-generated results and analysis of the dataset?

  5. The pipeline does not currently support batch parallelization, processing images sequentially. For large screening datasets this may be a practical limitation, though the per-image processing time (seconds to minutes depending on image size and analysis modules) is acceptable for most experimental workflows.

    Nice work! This seems like a useful tool for making image analysis more accessible.

    I was curious whether there was a specific reason for the limitation you mention here: "The pipeline does not currently support batch parallelization, processing images sequentially."

    My first thought was that segmentation might be the main constraint (especially when using Cellpose). But it seems like one possible architecture would be to run segmentation first and save masks, then parallelize the downstream steps (puncta detection, colocalization, morphology measurements, etc.) across images or cells.

    That part of the pipeline seems like it could be relatively straightforward for Claude to parallelize with something like multiprocessing or joblib and might significantly improve throughput for larger datasets, since downstream analysis would no longer be tied to per-image segmentation.

    Was something like this considered, or are there constraints (e.g., Cellpose behavior, I/O, memory usage) that make parallelization less practical?