Maternal smoking DNA methylation risk score associated with health outcomes in offspring of European and South Asian ancestry

Curation statements for this article:
  • Curated by eLife

    eLife logo

    eLife assessment

    This study offers a useful advance by introducing a cord blood DNA methylation score for maternal smoking effects, with the inclusion of diverse cohorts. However, the overall strength of evidence is deemed incomplete, due to concerns regarding low exposure levels, low statistical power, potential overfitting, and the need for clearer descriptions of statistical methods. Building more directly from the existing evidence base, exploring differences between ancestries, and considering additional health outcomes would help to enhance the study.

This article has been Reviewed by the following groups

Read the full article See related articles

Abstract

Maternal smoking has been linked to adverse health outcomes in newborns but the extent to which it impacts newborn health has not been quantified through an aggregated cord blood DNA methylation (DNAm) score. Here we examine the feasibility of using cord blood DNAm scores leveraging large external studies as discovery samples to capture the epigenetic signature of maternal smoking and its influence on newborns in White European and South Asian populations. We first examined association between individual CpGs and cigarette smoking during pregnancy, smoking exposure in two White European birth cohorts (n = 744). Several previously reported genes for maternal smoking were supported, with the strongest and most consistent signal from the GFI1 gene (6 CpGs with p < 5×10 -5 ). Leveraging established CpGs for maternal smoking, we constructed a cord blood epigenetic score of maternal smoking that was internally validated in one of the European-origin cohorts (n = 347). This score was then tested for association with smoking status, secondary smoking exposure during pregnancy, and health outcomes in offspring measured after birth in an independent white European (n = 397) and a South Asian birth cohort (n = 504). The epigenetic maternal smoking score was strongly associated with smoking status during pregnancy (OR=1.09 [1.07,1.10], p =1.96×10 -32 ) and more hours of self-reported smoking exposure per week (1.97 [1.22, 2.71], p =2.80×10 -7 ) in White Europeans, but not with self-reported exposure ( p > 0.05) in South Asians. The same score was consistently associated with smaller birth size (-0.22 cm [-0.35, -0.083], p =0.0016) and lower birth weight (-0.05kg [-0.075, -0.025], p =3.42×10 -4 ) in the combined South Asian and White European cohorts. This cord blood epigenetic score can help identify babies exposed to maternal smoking and assess its long-term impact on growth. Notably, these results indicate a consistent association between the DNAm signature of maternal smoking and a small body size and low birthweight in newborns, in both white European mothers who exhibited some amount of smoking and in South Asian mothers who themselves were not active smokers.

Article activity feed

  1. Author Response

    We thank the reviewers for spending the time to read and provide reviews for our manuscript. The reviewers bring good points regarding the sample size, and the low exposure in the South Asian cohort owing to their unique cultural and social practices. We recognize these as limitations of the paper and will discuss these more extensively in the revised version. With respect to sample size, we are not attempting discovery but rather application of mDNA scores derived from external, large discovery samples. As such, though our sample sizes (n = 300–500) seem low for a typical EWAS, they are in a similar range as replication samples in other studies.

    We would also like to take this opportunity to emphasize there is no possible overfitting as the score was tested in studies (FAMILY and START) independent of the discovery set (Joubert et al., 2016; n > 5,000) and the LASSO validation (CHILD; n = 352). In other words, the same participants used for LASSO validation were not used in testing. This is precisely to leverage the larger sample size from external studies to select more plausible CpGs as candidates to include in the model. In fact, the discovery sample size in Reese et al., (2017) was only n = 1,057 in comparison.

    The validated score was then used for further testing in new datasets (FAMILY and START), where FAMILY achieved a more significant association than in the original validation sample (CHILD). At the same time, the mean squared error for the continuous smoking severity outcome (0 for no smoking, 1 for quit before pregnancy, 2 for quit during pregnancy, and 3 for current smoker) was 0.68 in CHILD and 1.43 in FAMILY, which indicate good fit; while the AUC for predicting current vs. non-smoker was 0.86 in CHILD and 0.9 in FAMILY. Taken together, these suggest the MRS constructed was not in violation of overfitting, or “failing to fit to additional data or predict future observations reliably”.

    In terms of value, our derived score contained 11 CpGs that only overlapped 2 out of the 28 CpGs in the score that was derived in the reference provided (Reese, EHP, 2017, PMID 27323799), but they shared four genes that contributed the most weight to the score (MYO1G, CYP1A1, AHRR, and GFI1). In fact, using the 7 CpGs of the score derived in Reese that were present in all cohorts, we obtained slightly worse performance in CHILD (validation cohort; ANOVA p = 4.1E-5, AUC 0.74), and it was not associated with smoking history in FAMILY (testing cohort; p = 0.13). However, we do agree with the reviewer that including more CpGs will improve the performance, using 24/28 CpGs available in CHILD (HM450K), we obtained slightly better results (ANOVA p = 3.8E-7, AUC 0.94), but these were mostly due to the 14/24 CpGs that showed evidence of association with maternal smoking according to EWAS catalog. In conclusion, we believe our score captures the core genes with robust evidence of association and is more parsimonious for applying to external data, but it can also benefit from a larger sample size to capture CpGs that are moderately associated with maternal smoking.

  2. eLife assessment

    This study offers a useful advance by introducing a cord blood DNA methylation score for maternal smoking effects, with the inclusion of diverse cohorts. However, the overall strength of evidence is deemed incomplete, due to concerns regarding low exposure levels, low statistical power, potential overfitting, and the need for clearer descriptions of statistical methods. Building more directly from the existing evidence base, exploring differences between ancestries, and considering additional health outcomes would help to enhance the study.

  3. Reviewer #1 (Public Review):

    Summary:

    The authors report on the development of the first cord blood DNA methylation score to capture the epigenetic effects of maternal smoking. The score was built in a White European cohort and tested in White European and South Asian ancestry cohorts. Additionally, epigenome-wide association studies were conducted to quantify the impact of maternal smoking on newborn health.

    Strengths:

    The main strengths include the use of multiple cohorts of different ancestries. This is also the first study to build a cord blood predictor of maternal smoking.

    Weaknesses:

    The manuscript could benefit from a more detailed description of methods, especially those used to derive MRS for maternal smoking, which appears to involve overfitting. In particular, the addition of a flow chart would be very helpful to guide the reader through the data and analyses. The FDR correction in the EWAS corresponds to a fairly liberal p-value threshold.

  4. Reviewer #2 (Public Review):

    Summary:

    The authors generated a DNA methylation score in cord blood for detecting exposure to cigarette smoke during pregnancy. They then asked if it could be used to predict height, weight, BMI, adiposity, and WHR throughout early childhood.

    Strengths:

    The study included two cohorts of European ancestry and one of South Asian ancestry.

    Weaknesses:

    1. The number of mothers who self-reported any smoking was very low, much lower than in the general population and practically non-existent in the South Asian population. As a result, all analyses appeared to have been underpowered. It is possibly for this reason that the authors chose to generate their DNA methylation model using previously published summary statistics. The resulting score is not of great value in itself due to the low-powered dataset used to estimate covariance between CpG sites. In fact, a score was generated for a much larger, better-powered dataset several years ago (Reese, EHP, 2017, PMID 27323799).

    2. The conclusion that "even minimal smoking exposure in South Asian mothers who were not active smokers showed a DNAm signature of small body size and low birthweight in newborns" is not warranted because no analyses were performed to show that the association between DNA methylation and birth size/weight was driven by maternal smoking.

    3. Although it was likely that some mothers were exposed to second-hand smoke and/or pollution, data on this was either non-existent or not included in this study. Including this would have allowed a more novel investigation of the effects of smoke exposure on the pregnancies of non-smoking mothers.

    4. One of the European cohorts and half of the South Asian cohort had DNA methylation measured on only 2500 CpG sites. This set of sites included only 125 sites previously linked to prenatal smoking. The resulting model of prenatal smoking was small (only 11 CpG sites). It is possible that a large model may have been more powerful.

    5. The health outcomes investigated are potentially interesting but there are other possibly more important outcomes of interest such as birth complications, asthma, and intellectual impairment which are known to be associated with prenatal smoking.