Direct causal variable discovery leveraging the invariance principle: application in biomedical studies

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Accurate identification of direct causal (parental) variables for a target is of primary interest in many applications, especially in biomedical sciences. It could promote our understanding of disease mechanisms, and facilitate the discovery of new biomarkers and therapeutic targets for clinical traits. However, standard machine learning approaches often identify spurious associations, while existing causal inference methods for direct causal variables can be computationally infeasible for high-dimensional biomedical data.

Here, we proposed a novel and efficient two-stage approach (I-GCM) to discover direct causal variables (including genetic and clinical variables) for clinical outcomes. The method first employs the PC-simple algorithm for feature screening, then leverages the principle of causal invariance across different environments. Causal relationships are robustly identified by testing for changes in the generalized covariance measure (GCM), calculated using flexible gradient-boosted tree models.

We first verified the proposed method through extensive simulations. I-GCM constantly yielded high precision (positive predictive value) and specificity while maintaining satisfactory sensitivity in general, and consistently outperformed a standard method. Notably, the precision was larger than 90% in our simulated scenarios, even in high-dimensional settings.

We then applied I-GCM to the UK-Biobank, analyzing genetic and clinical data to identify causal factors for COVID-19 infection/severity and lipid traits (HDL, Triglycerides). The analysis successfully recovered many known clinical risk factors, validating the method’s real-world performance, and uncovered novel putative causal genes and biological pathways supported by existing literature.

Importantly, our work pioneers the application of the invariance principle for causal inference in biomedical/clinical studies, and suggests a new avenue for causal discovery in these settings.

Article activity feed