Model-based Standardization of Correlation Coefficients Improves Multi-Omic Clustering and Biological Signal Discovery

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Multi-omic data pose a particular challenge for Weighted Correlation Network Analysis (WCNA or WGCNA) due to (platform- or) batch-specific characteristics, such as resolution, accuracy, dynamic range, and sources of spurious variation. When unaccounted for, these differences can result in a bias toward single-batch clusters as well as greater sensitivity to "noisier" batches during clustering. Here we propose mitigating these effects using null models fitted separately to the bulk of analyte-analyte correlations within each batch and across each pair of batches. We then map the batch-specific null models to a standard null model, removing batch-dependent distributional differences. This approach is compatible with any correlation-based clustering approach. Since the null model represents information not captured in individual pairwise correlations, we show how to incorporate this additional information into both distance-based clustering and WCNA. For distance-based clustering, we increase distances corresponding to correlations consistent with the null model. For WCNA, we provide a new soft threshold (adjacency) function based on the likelihood of a correlation under the null model. The resulting network can be easily incorporated into the WCNA workflow. These methods are implemented in R package standardcor, and we illustrate the package on simulated data as well as an existing multi-omic dataset.

Article activity feed