Toward generalizable phenotype prediction from single-cell morphology representations

Jenna Tomkinson
Roshan Kern
Cameron Mattson
Gregory P. Way

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Functional cell processes (e.g., molecular signaling, response to environmental stimuli, mitosis, etc.) impact cell phenotypes, which scientists can easily and robustly measure with cell morphology. However, linking these morphology measurements with phenotypes remains challenging because biologically interpretable phenotypes require manually annotated labels. Automatic phenotype annotation from cell morphology would link biological processes with their phenotypic outcomes and deepen understanding of cell function. We propose that nuclear morphology can be a predictive marker for cell phenotypes that is generalizable across cell types. Nucleus morphology is commonly and easily accessible with microscopy, but annotating specific phenotypic information requires labels. Therefore, we reanalyzed a pre-labeled, publicly-available nucleus microscopy dataset from the MitoCheck consortium to predict single-cell phenotypes. We extracted single-cell morphology features using CellProfiler and DeepProfiler, which provide fast, robust, and generalizable data processing pipelines. We trained multinomial, multi-class elastic net logistic regression models to classify nuclei into one of 15 phenotypes such as ‘Anaphase,’ ‘Apoptosis’, and ‘Binuclear’. In a held-out test set, we observed an overall F1 score of 0.84, where individual phenotype scores ranged from 0.64 (indicating moderate performance) to 0.99 (indicating high performance). Notably, phenotypes such as ‘Elongated’, ‘Metaphase’, and ‘Apoptosis’ showed high performance. While CellProfiler and DeepProfiler morphology features were generally equally effective, combining feature spaces yielded the best results for 9 of the 15 phenotypes. However, leave-one-image-out (LOIO) cross-validation analysis showed a significant performance decline, indicating our model could not reliably predict phenotype in new single images. Poor performance, which we show was unrelated to factors like illumination correction or model selection, limits generalizability to new datasets and highlights the challenges of morphology to phenotype annotation. Nevertheless, we modified and applied our approach to the JUMP Cell Painting pilot data. Our modified approach improved dataset alignment and highlighted many perturbations that are known to be associated with specific phenotypes. We propose several strategies that could pave the way for more generalizable methods in single-cell phenotype prediction, which is a step toward morphology representation ontologies that would aid in cross-dataset interpretability.

Version published to 10.1101/2024.03.13.584858 on bioRxiv
Mar 13, 2024

Discovering cell types and states from reference atlases with heterogeneous single-cell ATAC-seq features

This article has 2 authors:
1. Xiuwei Zhang
2. Yuqi Cheng
This article has no evaluationsLatest version Dec 10, 2025
Accurate, scalable, and unified single-cell atlas integration with scBIOT

This article has 2 authors:
1. Haihui Zhang
2. Peiwu Qin
This article has no evaluationsLatest version Jan 19, 2026
Multimodal Data Fusion Reveals Morpho-Genetic Variations in Human Cortical Neurons Associated with Tumor Infiltration

This article has 18 authors:
1. Hanchuan Peng
2. Yufeng Liu
3. Zhixi Yun
4. Lingli Zhang
5. Wen Ye
6. Kaifeng Chen
7. Xiefeng Wang
8. Mengzhu Ou
9. Jing Rong
10. Xiaomin Yang
11. Lei Mao
12. Chiyuan Ma
13. Liang Chen
14. Ying Mao
15. Nan Ji
16. Liwei Zhang
17. Yongping You
18. Junxia Zhang
This article has no evaluationsLatest version Dec 29, 2025

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Discovering cell types and states from reference atlases with heterogeneous single-cell ATAC-seq features

Accurate, scalable, and unified single-cell atlas integration with scBIOT

Multimodal Data Fusion Reveals Morpho-Genetic Variations in Human Cortical Neurons Associated with Tumor Infiltration