Similarity metric learning on perturbational datasets improves functional identification of perturbations

Ian Smith
Petr Smirnov
Benjamin Haibe-Kains

This article has been Reviewed by the following groups

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

Evaluated articles (preLights)

Abstract

Analysis of high-throughput perturbational datasets, including the Next Generation Connectivity Map (L1000) and the Cell Painting projects, uses similarity metrics to identify perturbations or disease states that induce similar changes in the biological feature space. Similarities among perturbations are then used to identify drug mechanisms of action, to nominate therapeutics for a particular disease, and to construct bio-logical networks among perturbations and genes. Standard similarity metrics include correlations, cosine distance and gene set enrichment methods, but these methods operate on the measured features without refinement by transforming the measurement space. We introduce Perturbational Metric Learning (PeML), a weakly supervised similarity metric learning method to learn a data-driven similarity function that maximizes discrimination of replicate signatures by transforming the biological measurements into an intrinsic, dataset-specific basis. The learned similarity functions show substantial improvement for recovering known biological relationships, like mechanism of action identification. In addition to capturing a more meaningful notion of similarity, data in the transformed basis can be used for other analysis tasks, such as classification and clustering. Similarity metric learning is a powerful tool for the analysis of large biological datasets.

preLights
Aug 17, 2023

Excerpt
Weak supervision - Strong results! Smith and colleagues introduce Perturbational Metric Learning (PeML), a weakly supervised similarity metric learning method to extract biological relationships from noisy high-throughput perturbational datasets

Read the original source
Version published to 10.1101/2023.06.09.544397 on bioRxiv
Jun 11, 2023

Discovering cell types and states from reference atlases with heterogeneous single-cell ATAC-seq features

This article has 2 authors:
1. Xiuwei Zhang
2. Yuqi Cheng
This article has no evaluationsLatest version Dec 10, 2025
Integrating Evolutionary and Compositional Features with ML and DL for Robust and Interpretable Druggable Protein Prediction

This article has 5 authors:
1. Mujeebu Rehman
2. Qinghua Liu
3. Muhammad Javed
4. Ali Ghulam
5. Teerath Kumar
This article has no evaluationsLatest version Dec 11, 2025
Uncovering miRNA–Disease Associations Through Graph Based Neural Network Representations

This article has 1 author:
1. Alessandro Orro
This article has no evaluationsLatest version Jan 28, 2026

This article has been Reviewed by the following groups

Discuss this preprint

Listed in

Abstract

Article activity feed

Excerpt

Related articles

Discovering cell types and states from reference atlases with heterogeneous single-cell ATAC-seq features

Integrating Evolutionary and Compositional Features with ML and DL for Robust and Interpretable Druggable Protein Prediction

Uncovering miRNA–Disease Associations Through Graph Based Neural Network Representations