Cross-platform metabolomics imputation using importance-weighted autoencoders
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Background
Metabolomics data are often generated through different analytical platforms and different methods of identification and quantification which makes their synthesis and large-scale replication challenging. To address this, we applied generative deep learning to impute metabolites assayed by Metabolon, a commonly used commercial platform, using metabolomic features acquired by an untargeted liquid chromatography-mass spectrometry (LC-MS) platform.
Methods
We utilised a subset of 979 samples from the Airwave Health Monitoring Study which were assayed by both Metabolon and National Phenome Centre at Imperial College (NPC) LC-MS assays to develop an ensemble of importance-weighted autoencoders (IWAEs) which can perform cross-platform metabolomics imputation between the two assays. Using the ensemble, we generated a Metabolon equivalent dataset in 2,971 additional Airwave samples that lacked prior Metabolon measurements. We conducted observational associations with two clinical outcomes, body mass index (BMI) and C-reactive protein (CRP). We validated the ensemble and imputed data by investigating the concordance of the observational associations. This was done using both the imputed Metabolon dataset and the measured metabolite levels by Metabolon, and NPC in the Airwave study and Nightingale platform in the UK Biobank.
Results
Our imputation ensemble generated samples highly correlated with their real values across all Metabolon metabolites within a held-out test set with a mean sample correlation of 0.61 (IQR 0.55-0.67). The well-imputed subset included 199 (22%) of the metabolites present in the real Metabolon dataset where the imputed values accounted for at least 55% of the original variance (R 2 ≥ 0.55) and a minimal uncertainty (R 2 variance ≤ 0.025). The subset included 43 metabolites not previously identified within our LC-MS platform. When comparing the associations of the real and imputed Metabolon metabolites with BMI and CRP, the standardised beta-coefficients were highly correlated (ρ = 0.93 for BMI and 0.89 for CRP) with minimal mean difference (0.005 (0.04) for BMI, 0.005 (0.04) for CRP). Similar concordance occurred between the imputed Metabolon metabolites and equivalent UK Biobank (mean difference -0.007 (0.05) for BMI, 0.01 (0.04) for CRP) and our LC-MS platform (mean difference -0.013 (0.04) for BMI, -0.019 (0.04) for CRP).
Conclusion
This methodological innovation offers a scalable and accurate method for cross-platform imputation which could allow for to aggregate individual-level metabolomics data from different epidemiological studies, replication findings or conduct meta-analyses.