Model-based Standardization of Correlation Coefficients Improves Multi-Omic Clustering and Biological Signal Discovery
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Multi-omic data pose a particular challenge for Weighted Correlation Network Analysis (WCNA or WGCNA) due to (platform- or) batch-specific characteristics, such as resolution, accuracy, dynamic range, and sources of spurious variation. When unaccounted for, these differences can result in a bias toward single-batch clusters as well as greater sensitivity to "noisier" batches during clustering. Here we propose mitigating these effects using null models fitted separately to the bulk of analyte-analyte correlations within each batch and across each pair of batches. We then map the batch-specific null models to a standard null model, removing batch-dependent distributional differences. This approach is compatible with any correlation-based clustering approach. Since the null model represents information not captured in individual pairwise correlations, we show how to incorporate this additional information into both distance-based clustering and WCNA. For distance-based clustering, we increase distances corresponding to correlations consistent with the null model. For WCNA, we provide a new soft threshold (adjacency) function based on the likelihood of a correlation under the null model. The resulting network can be easily incorporated into the WCNA workflow. These methods are implemented in R package standardcor, and we illustrate the package on simulated data as well as an existing multi-omic dataset.