Machine Learning Approaches to Predict Alcohol Consumption from Biomarkers in the UK Biobank
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Background
Measuring and estimating alcohol consumption (AC) is important for individual health, public health, and Societal benefits. While self-report and diagnostic interviews are commonly used, incorporating biological-based indices can offer a complementary approach.
Methods
We evaluate machine learning (ML) based predictions of AC using blood and urine-derived biomarkers. This research has been conducted using the UK Biobank (UKB) Resource. In addition to the prediction of the number of alcoholic Drinks Per Week (DPW), four other related phenotypes were predicted for performance comparison. Five ML models were assessed including LASSO, Ridge regression, Gradient Boosting Machines (GBM), Model Boosting (MBOOST), and Extreme Gradient Boosting (XGBOOST).
Results
All five ML methods achieved moderate prediction of DPW (r 2 =0.304-0.356) with biomarkers significantly increasing prediction above using only known covariates and liver enzymes (r 2 =0.105). XGBOOST achieved the best prediction performance (r 2 =0.356, MAE=5.214) at the expense of increasing model complexity and training resources compared to other ML methods. All ML models were able to accurately predict if subjects were heavy drinkers (DPW>8 for women and DPW>15 for men) and produced explainable models that highlighted the role of biomarkers in predicting DPW. While phenotype correlations were similar across methods, XGBOOST produced similar heritability estimates for observed (h 2 =0.064) and predicted (h 2 =0.077) DPW. The estimated genetic correlation between observed and predicted DPW was 0.877.
Conclusions
Predicting AC from ML-based biological measures provides an opportunity to identify individuals at increased risk of heavy AC, thereby offering complementary avenue for risk assessment beyond self-report, screening instruments, or structured interviews, which have some known biases. In addition, explainable AI tools identified a constellation of biomarkers associated with AC.