A scalable, data analytics workflow for image-based morphological profiles

This article has been Reviewed by the following groups

Read the full article

Listed in

Log in to save this article

Abstract

Cell Painting is an established community-based, microscopy-assay platform that provides high-throughput, high-content data for biological readouts. In November 2022, the JUMP-Cell Painting Consortium released the largest annotated, publicly available dataset, comprising more than 2 billion cell images. This dataset is designed for predicting the activity and toxicity of 100k drug compounds, with the aim to make cell images as computable as genomes and transcriptomes.

In this paper, we have developed a data analytics workflow that is both scalable and computationally efficient, while providing significant, biologically relevant insights for biologists estimating and comparing the effects of different drug treatments.

The two main objectives proposed include: 1) a simple, yet sophisticated, scalable data analytics metric that utilizes negative controls for comparing morphological cell profiles. We call this metric the equivalence score (Eq. score). 2) A workflow to identify and amplify subtle morphological image profile changes caused by drug treatments, compared to the negative controls. In summary, we provide a data analytics workflow to assist biologists in interpreting high-dimensional image features, not necessarily limited to morphological ones. This enhances the efficiency of drug candidate screening, thereby streamlining the drug development process. By increasing our understanding of using complex image-based data, we can decrease the cost and time to develop new, life-saving treatments.

Author summary

Microscopy-assays are often used to study cell responses to treatments in the search for new drugs. In this paper, we present a method that simplifies the understanding of the data generated from such assays. The data in this study consists of 750 morphological features, which describe the traits and characteristics of the cells, extracted from the images. By using untreated cells as a biological baseline, we’re able to detect subtle changes that occur in the treated cells. These changes are then transformed into an equivalence score (Eq. score), a metric that lets us compare the similarities among different treatments relative to our baseline of untreated cells. Our Eq. score approach transforms complex, high-dimensional data about cell morphology into something more interpretable and understandable. It reduces the “noise” in the features and highlights important changes, the “signal”. Our method can be integrated into existing workflows, aiding researchers in understanding and interpreting complex morphological data derived from cell images more easily. Understanding cell morphology is crucial to deepening our knowledge of biological systems. Ultimately, this could contribute to the faster and more cost-effective development of new, life-saving treatments.

Article activity feed

  1. we create Eq. scores for all 303 compounds in the dataset and use them as new features

    What kind of time cost is associated with the training given the scale of this dataset?