Cross-Tissue Epigenetic Age Prediction with Compact CpG Panels
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Epigenetic age estimators based on DNA methylation provide powerful biomarkers of aging, but most clocks are tissue‑specific and rely on large CpG panels. Here we develop compact, interpretable machine learning models that capture age‑related DNA methylation patterns in human brain and blood, and we evaluate their cross‑tissue behavior using public Illumina 450K datasets. Using frontal cortex methylation profiles from GSE41826, we constructed an age‑group classifier (child vs adult/older) based on XGBoost and compared its performance with penalized logistic regression and random forests. After addressing class imbalance by up‑sampling, the brain XGBoost model achieved high accuracy and balanced precision–recall. SHAP (SHapley Additive exPlanations) analysis identified a small panel of CpG sites with strong influence on age classification, several of which map to genes previously implicated in development and aging, and overlap with CpGs from established epigenetic clocks. We then applied the brain‑trained model to a large peripheral blood dataset (GSE40279) to test cross‑tissue generalization, using only the CpGs shared between tissues. Despite limited CpG overlap, the model reliably distinguished child‑like from adult‑like methylation patterns in blood and highlighted a subset of older donors with “youthful” methylation signatures. Finally, we built a blood‑specific three‑class age classifier (young adult, middle‑aged, older adult) and compared tree‑based models with a TabTransformer architecture, finding that gradient‑boosted trees combined with SHAP provided a favorable balance of accuracy and interpretability. These results demonstrate that compact, biologically interpretable CpG panels can illuminate conserved genotype–phenotype relationships in mammalian aging, revealing cross-tissue methylation signatures with potential relevance for disease pathways and precision health applications.