A Novel Algorithm for the Harmonization of Pan-cancer Proteomics
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Proteomic characterization of cancer tissues holds the potential to advance therapeutic options and reveal novel biomarkers by unlocking insights available only on the proteome level. However, proteomics data analysis is greatly challenged by systematic technical variability in experimental protocols, instrumentation and data processing, restricting comparisons between studies. With the continued and unprecedented growth of proteomics datasets, a comprehensive strategy for harmonizing these datasets is necessary to enable large-scale integrative analyses. Herein, we describe a novel framework for pan-cancer harmonization and imputation, which offers the scientific community an updated approach to this challenge. Rather than relying on a single batch-effect correction algorithm, our multi-step approach accurately addresses critical systematic differences with custom-tailored solutions, including standardized reanalysis of raw data and an autoencoder for pan-cancer integration. By introducing a suite of benchmarks, we bridge the critical gap in reliable harmonization evaluation. Using this framework, we created a harmonized pan-cancer dataset and demonstrated its superiority over existing solutions and previous pan-cancer harmonization efforts. We further demonstrated its utility in revealing prognostic markers, estimating indication-wide biomarker prevalence, and facilitating target discovery for cancer subtypes. We expect our work to provide powerful tools supporting proteomics research for precision cancer medicine.