RNAcompare: Integrating machine learning algorithms to unveil the similarities of phenotypes based on clinical, multi-omics using Rheumatoid Arthritis and Heart Failure as Case Studies

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Background

Gene expression analysis is crucial for understanding the biological mechanisms underlying patient subgroup differences. However, most existing studies focus primarily on transcriptomic data while neglecting the integration of clinical heterogeneity. Although batch correction methods are commonly used, challenges remain when integrating data across different tissues, omics layers, and diseases. This limitation hampers the ability to connect molecular insights to phenotypic outcomes, thereby restricting clinical translation. Furthermore, the technical complexity of analysing large, heterogeneous datasets poses a barrier for clinicians. To address these challenges, we present RNAcompare, a development of RNAcare, employing machine learning techniques to integrate clinical and multi-omics data seamlessly.

Results

RNAcompare overcomes these challenges by providing an interactive, reproducible platform designed to analyse multi-omics data in a clinical context. This tool enables researchers to integrate diverse datasets, conduct exploratory analyses, and identify shared patterns across patient subgroups. The platform facilitates hypothesis generation and produces intuitive visualizations to support further investigation.

As a proof of concept, we applied RNAcompare to connect omics data to pain, fatigue, drug resistance in rheumatoid arthritis (RA) and disease severity to RA and Heart Failure (HF). Our analysis reduced selection bias and managed heterogeneity by identifying key contributors to treatment variability. We discovered shared molecular pathways associated with different treatments. Using SHAP (Shapley Additive Explanations) values, we successfully classified patients into three subgroups based on age, and subsequent analyses confirmed these age-related patterns. Additionally, we uncovered hidden patterns influencing pain and disease severity across different tissues, omics layers, and diseases. Notably, by integrating Causal Forests and Double Machine learning with clinical phenotypes, RNAcompare provides a novel approach to bypass traditional batch correction methods.

Conclusion

We introduce RNAcompare, a computational platform designed to compare clinical and multi-omics data across diverse patient cohorts in real-time. This tool supports both user-generated and publicly available datasets, offering a robust solution for identifying phenotypic similarities and enhancing our understanding of complex diseases such as RA and HF.

The platform is available at https://github.com/tangmingcan/RNAcompare .

Article activity feed