rplec: An R package of placental epigenetic clock to estimate aging by DNA-methylation-based gestational age
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Background
Latest placental epigenetic clocks (PlECs) were claimed to be robust when applied to cases with either maternal or fetal adverse conditions. However, the accuracies in estimating gestational age (GA) were lower in earlier trimesters. We aimed to develop an R package of PlEC to estimate aging by DNA-methylation-based GA (DNAm-GA).
Methods
We utilized 1742 samples of placental DNA methylation, provided by the 2024 Placental Clock DREAM Challenge. Our PlEC was trained using only used beta values at the common CpG sites in either the Infinium HumanMethylation-450 ( n =930)/-850 BeadChip arrays ( n =912) from Illumina, in which 100 samples were used for the validation set. External validation was independently evaluated by the challenge organizer using a publicly-unavailable test set ( n =384). Elastic regression was applied to develop a three-stage prediction model to estimate: (1) DNAm-GA among normal samples; (2) first residual DNAm-GA among samples with known phenotypes that were leading to earlier termination; and (3) second residual DNAm-GA depending on the estimated GA from the previous stages. An R package was developed to simplify our scikit-learn models into a single function and to utilize DNAm-GA for placental aging study.
Results
Our PlEC required beta values at 10,433 CpG sites and achieved the top performance in the validation set. Based on the test set, the root mean squared-error (RMSE) was 1.245 weeks. The RMSE for preterm samples were lower (0.558, 95% confidence interval [CI] 0.545, 0.570) compared to the two previous PlECs using the common dataset: (1) Lee et al (1.696, 95% CI 1.667, 1.724); and (2) Mayne et al (4.018, 95% CI 3.927, 4.108). We developed rplec R package with only two functions for preprocessing input and estimating DNAm-GA and two functions for conducting quality control and utilizing DNAm-GA for placental aging study. The simplified version of PlEC achieved similar performance with the original scikit-learn model with RMSE 0.102 (95% CI 0.101, 0.104), which was reasonably imperfect since Python and R handle floating/decimal numbers, differently.
Conclusions
Our R package precisely estimated DNAm-GA and our analytical framework could utilize DNAm-GA for placental aging study. Our PlEC also allows individual assessment of placental aging in clinical settings via the residual DNAm-GA. Future studies are needed to refine the first residual GA estimation and reduce the number of predictors while maintaining the accuracy.