PXN Unlocks the Power of Public Gene Expression Data Through Cross-Technology Integration

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

The immense value of public gene expression repositories is constrained by the lack of compatibility among datasets generated from diverse experimental technologies. Differences in measurement scales, probe chemistries, and signal distributions create systematic discrepancies across platforms and laboratories. These inconsistencies make large-scale integrative analysis nearly impossible, even though such studies could achieve great statistical power and improved reproducibility. We introduce PXN, a probabilistic machine learning framework that captures a unified representation of biological signal across multiple gene expression technologies. Once trained, PXN can seamlessly translate data between multiple platforms, preserving informative biological variation while removing technology-specific biases. In benchmarking studies, PXN consistently outperforms existing normalization methods in cross-platform accuracy and substantially enhances the power of differential expression analysis. Importantly, we show that PXN is powerful enough to bridge even the most challenging technological divide—between microarray and RNA-seq. This capability provides a scalable route for integrating legacy microarray data with modern RNA-seq studies. By enabling direct comparison and integration of heterogeneous datasets, PXN unlocks the full potential of public repositories for future biological discovery and therapeutic innovation.

Article activity feed