Mango: Unearthing Patterns in Large-Scale Biological Data Through Interactive Correlation Analysis
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Integrating different types of biological data is often challenging due to the presence of both numerical and categorical data. This complexity makes it harder to evaluate causal biological effects, especially when confounders like population structure, sampling methods, or multi-omics integration can lead to incorrect conclusions. We introduce Mango, an interactive correlation browser designed for visually exploring any tabular data type, using a novel algorithm to correlate numerical and categorical data, regardless of their distribution, called Median-Ranked Label Encoding. Our results on genomic and transcriptomic datasets demonstrate that these correlations can effectively distinguish between biases and causal relationships in large-scale data.