Transfer Learning for Mendelian Randomization with Pleiotropic and Correlated High-Dimensional Exposures and Its Application to Trans-ethnic Populations

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

The majority of large-scale GWAS cohorts are of European ancestry, making it critical to generalize findings from European populations to other ethnic groups with smaller sample sizes. Traditional multivariable Mendelian randomization (MR) methods often perform poorly in high-dimensional and correlated exposure settings, due to issues such as collinearity and unstable model fitting. To address these challenges, we propose a two-step statistical transfer learning method for high-dimensional MR, called TL-HDMR, which enhances the detection of causal exposures in target populations. To mitigate the impact of exposure correlations, we incorporate the Minimax Concave Penalty, enabling asymptotically unbiased estimation of causal effects. Furthermore, we introduce two pre-transfer procedures—HDMR.SCD, which identifies source datasets conducive to positive transfer, and HDMR.PRESSO, which filters out pleiotropic instruments—to optimize the performance of TL-HDMR. We evaluate our method through extensive simulations and compare it with three alternative penalization approaches in high-dimensional scenarios. TL-HDMR demonstrates the best performance for ROC curve and mean absolute error. To illustrate its practical utility, we applied TL-HDMR to accurately identify ethnic-specific and subtype-specific causal biomarkers for stroke among 899 blood metabolites in a multi-ancestry context (European, East Asian, South Asian and African). The results highlight the ability of TL-HDMR to provide equitable and generalizable causal inferences across diverse populations. This work represents a meaningful advance in addressing the dual challenges of high-dimensional correlated exposures and cross-population generalization, thereby enhancing the robustness and inclusivity of causal inference in human health research.

Article activity feed