Environmentally sensitive hotspots in the methylome of the early human embryo

Curation statements for this article:
  • Curated by eLife

    eLife logo

    Evaluation Summary:

    This manuscript will be of interest to researchers interested in the influence of prenatal exposures on infant health. The authors investigate the impact of the season of conception on child DNA methylation levels in two independent cohorts from the Gambia and identify a set of CpGs that are tightly regulated during development. The data support the main conclusions of the manuscript, but some of the analyses could be improved (i.e. possible presence of residual confounding). There is also limited evidence for the functional importance of the observed associations.

    (This preprint has been reviewed by eLife. We include the public reviews from the reviewers here; the authors also receive private feedback with suggested changes to the manuscript. Reviewer #1 agreed to share their name with the authors.)

This article has been Reviewed by the following groups

Read the full article See related articles

Abstract

In humans, DNA methylation marks inherited from gametes are largely erased following fertilisation, prior to construction of the embryonic methylome. Exploiting a natural experiment of seasonal variation including changes in diet and nutritional status in rural Gambia, we analysed three datasets covering two independent child cohorts and identified 259 CpGs showing consistent associations between season of conception (SoC) and DNA methylation. SoC effects were most apparent in early infancy, with evidence of attenuation by mid-childhood. SoC-associated CpGs were enriched for metastable epialleles, parent-of-origin-specific methylation and germline differentially methylated regions, supporting a periconceptional environmental influence. Many SoC-associated CpGs overlapped enhancers or sites of active transcription in H1 embryonic stem cells and fetal tissues. Half were influenced but not determined by measured genetic variants that were independent of SoC. Environmental ‘hotspots’ providing a record of environmental influence at periconception constitute a valuable resource for investigating epigenetic mechanisms linking early exposures to lifelong health and disease.

Article activity feed

  1. Author Response:

    Reviewer #1:

    This manuscript by Silver, et al., details work investigating the relationship between season of conception and DNA methylation differences at sites across the genome, measured by widely-used arrays, in two cohorts of children using Fourier regression. They find that season of conception is associated with persistent methylation differences at several hundred CpG sites, and that these CpG are enriched for properties, compared to sets of control sites, that suggest that methylation at these sites is influenced very early in development/during conception and that these sites are positioned in genomic regions relevant for gene activation and regulation. Additional analyses investigated the effects of genetic variation of these sites, and found no evidence for single nucleotide polymorphisms nor child sex confounding the associations between season of conception and DNA methylation. As the number of sites measures by these arrays are a very small amount of total sites across the genome, the authors suggest that these findings indicate there may be many more sensitive methylation 'hotspots' in the genome that are not captured by these arrays but could impact on health/development.

    The key strengths of this manuscript include the use of two cohorts of children at different ages, providing evidence that these effects of season of conception appear to attenuate by 8-9 years of age; and comparison with control sites and additional analyses investigating confounding to build the evidence for these relationships reflecting true, biological associations rather than statistical artefacts or the result of confounding.

    However, the conclusions around the potential functional importance of these methylation differences are limited by a lack of evidence for a relationship between methylation of these season-of-conception-associated sites and child growth/development, so while this manuscript builds compelling evidence for the effects of season of conception on methylation, it's functional relevance is unclear. Additionally, there are some choices made in the analyses where the rationale for those choices should be made more clear, such as the use of CpG sites above or below a certain estimated effect size for different analyses.

    Overall, the approach taken here to demonstrate different levels of evidence for true relationships between early development exposures and differences in DNA methylation is a compelling one, and the manuscript delivers clear evidence for its primary conclusions.

    We are currently researching links between several SoC-CpGs and health-related outcomes including measures of growth, and we have prepared/submitted other papers with different groups of authors (e.g. the EMPHASIS team) relating to other phenotypes. We consider a detailed analysis of links between SoC-CpGs and diverse outcome measures in Gambian children to be beyond the scope of the current study and would argue that such an analysis would dilute the central focus of this paper that is already long and complex. We do already refer to two existing studies linking Gambian SoC or nutrition-associated CpGs to health outcomes in non-Gambians (child & adult obesity/POMC, Kuhnen et al Cell Metab 2016; cancer/VTRNA2-1, Silver et al, Gen Biol 2015) in the current manuscript. The VTRNA2-1 locus does not overlap any SoC-CpGs and we already speculate that this may be due to SoC effect attenuation, since the previous association was observed in younger (3-9mth) infants. We have additionally referenced a recently published paper linking another SoC-associated locus to thyroid volume and function in Gambian children (Candler et al Sci Adv 2021) and highlighted that neither this nor the POMC locus overlap the array background analysed in this study. Finally we had already included an analysis of overlaps between SoC-CpGs and traits in published EWAS and GWAS catalogues.

    Regarding our use of different SoC amplitude thresholds for one analysis, our original motivation for analysing all 768 ‘SoC-associated CpGs’ with FDR<5% in the ENID 2yr analysis, including those with amplitude < 4%, was to explore the degree to which the strength / amplitude of SoC effects could be explained by proximity to ERV1 over the wider range of amplitudes represented by the larger set of loci. However we agree that this approach is open to question and have removed this analysis (previous Fig. 6B and Supp. Fig. 11, and text in section headed ‘Enrichment of transposable elements and transcription factors associated with genomic imprinting’). We have also removed the definition of ‘SoC- associated CpGs’ (which included CpGs with SoC amplitude < 4%) from Table 2 and Methods to aid clarity and avoid confusion.

    Reviewer #2:

    This is a very interesting manuscript, which will be of interest for a broader readership. The authors have analysed an unique cohort, which is of importance to understand the impact of environmental factors on DNA methylation.

    The performed analysis is well balanced, and the conclusions are justified by the presented data. It is a strength of this study, that results from the initial ENID study have been re-evaluated in the EMPHASIS study. Unfortunately, DNA methylation has been analysed using HM450 and EPIC arrays. Both methods are providing only a limited view on methylome-wide DNA methylation.

    Another limitation (as already addressed by the authors) is the lack of longitudinal samples. This would potentially have helped to gain further knowledge about the identified attenuation of DNA methylation levels at SoC associated CpGs.

    Finally, I am not entirely sure, that one confounding factor has been completely ruled out: It is known, that blood composition may cause methylation variability. In general, the authors addressed this point and analysed blood compositions (supplementary Figure 16) of both cohorts. Here, no marked seasonal differences between and within both cohorts have been identified. However, the participants of the EMPHASIS cohort have a very similar age (8-9 years). For this reason, I am wondering if methylation variability/ differences and in addition the attenuation of methylation levels might be influenced by the younger age of ENID participants compared to EMPHASIS study individuals.

    We agree that the necessary restriction of our analysis to data derived from Illumina 450k and EPIC arrays means that we can only obtain a limited view of DNAm loci associated with Gambian season of conception. We expect that there will be many more such hotspots across the human methylome. We have commented on this in the Discussion.

    Regarding the lack of longitudinal data to confirm the potential attenuation of SoC effects with age observed between unrelated cohorts, we are pleased to report that we have now acquired an additional EPIC array dataset covering a subset of n=138 individuals from the ENID cohort included in the main analysis. This subset had methylation measured in blood at age 5-7yrs enabling us to conduct an investigation of longitudinal methylation changes in these individuals. This analysis strongly supports the circumstantial evidence of SoC effect attenuation with age suggested by our previous comparison of the independent ENID (2yr) and EMPHASIS (7-9yr) cohorts, with:

    a) strong correlation of conception date methylation maximum between age 2yr and 5- 7yrs at SoC-CpGs in these 138 individuals (Figs. 3A, 4A); and

    b) evidence of SoC effect size attenuation at the majority of SoC-CpGs (Fig. 3B; Wilcoxon signed rank sum p=10-12).

    We note that this additional longitudinal dataset has a different confounding structure with respect to biological and technical covariates (Supp Tables 15-17) and date of sample collection (Supp. Fig. 1B), lending strong support to our previous two-cohort cross-sectional analysis.

    Regarding the potential for confounding by differences in blood cell composition, we have performed an additional sensitivity analysis with Houseman estimated blood cell counts added directly to the linear regression model for the ENID cohort (see ST1s). 518 out of the 520 estimated Fourier regression coefficients from the main analysis (1 pair of sine and cosine terms for each of the 259 SoC-CpGs) fall within the 95% confidence interval obtained in the Houseman-adjusted analysis, confirming that cell composition effects did not unduly influence SoC effect estimates in the original analysis. We have added a brief note on this and the other sensitivity analyses (batch, cell composition and village effects) in Results to the manuscript, with more details in Methods.

    If the reviewer is referring to the possibility that the SoC effect attenuation with age could be driven by different cell composition effects in the older cohort, we think that the replication of the timing of SoC effects across the 3 datasets analysed (including the additional longitudinal data; Fig. 4A), all of which have different confounding structures with respect to season of sample collection (Fig. 2A; Supp Fig. 1B), together with additional evidence of SoC effect attenuation with age in the longitudinal analysis (Fig. 3B) support this being a genuine age attenuation effect.

    Reviewer #3:

    Silver et al. Investigate the influence of seasonal variation (nutrition, infection, environment) on blood DNA methylation in two cohorts of children (233 [2y] and 289 [8y-9y]) from the same sustenance farming communities in rural Gambia. One cohort (450K,233) was extensively studied before in multiple publications, the second dataset (850k,289) is unpublished. Using cosinor modeling they find 768 CpGs with a significant seasonal pattern(SoC-CpG, FDR<0.05) in the probes that overlap between the 450k and 850k arrays. Look-up of these 768 SoC-CpGs in the second sample showed 61 SoC-CpGs with FDR 0.05 (no mention is made if the direction of effect is consistent, but we assume it is so).

    In fact we did report that the ‘direction’ of the effect (conception date at methylation maximum) is highly consistent with increased DNAm in conceptions at the peak of the rainy season across the two cohorts at the 61 SoC-CpGs with FDR<0.05 – see Fig. 2C.

    The authors notice that most SoCs seem to be attenuated in the 8-9y sample. Then the authors select out of the 768 SoC-CpG the FDR<0.05 and >=4% seasonal amplitude in this discovery sample: 257 which they bring further in (enrichment) analyses. It is unclear if all 257 are (nominally) significant in the replication sample.

    We did not check this because of evidence that, despite strong replication of effect direction (Fig. 4A), the amplitude of the SoC effect attenuates with age (Fig. 2E). This means that it would not be surprising if one or more SoC-CpGs failed to achieve nominal significance in the older cohort. This is now strongly supported by our additional analysis of longitudinal data confirming SoC effect attenuation with age and consistency of SoC effect direction (Figs. 3B and 4A).

    These SoC-CpGs are enriched for imprinted and oocyte germline loci. Roughly 10% of SoC-CpGs overlap with so-called meta-stable epialleles (MEs), on which the authors have published greatly. This is a large fold enrichment, and subsequently the main focus of the Results and Discussion. Indeed, it skews the Discussion heavily and one wonders what could have been found in the other 90%?

    Our strategy throughout the Results and Discussion was to focus on characteristics including metastability, parent of origin-specific methylation, histone modifications and gametic and early embryo methylation patterns that suggest a link to establishment of methylation states in the early embryo at SoC-CpGs. For these analyses all SoC-CpGs were considered at every stage and metastability was not the primary focus. However, as the reviewer suggests, we do repeatedly point out that many of the above contextual characteristics that are associated with SoC-CpGs have also been associated with metastability which we consider to be worthy of note, in part because it suggests that many SoC-CpGs may in fact be MEs, despite not having been previously identified as such. We have further cause to believe this could be the case because of i) the typically small sample size of multi-germ layer/tissue datasets used to screen for MEs, meaning that published screens for human MEs are likely to be underpowered and will hence fail to capture most MEs; and ii) the evidence that we present suggesting that environmentally-driven inter-individual variation at loci exhibiting ME-like properties may diminish with age, again suggesting that ME screens, which largely analyse adult tissues, will miss metastable loci present in infancy and early childhood.

    We had already made the point ii) above in the Discussion. However, given the reviewer’s concerns we have added an additional comment on point i).

    The Discussion is heavily geared to interpretation within their MEs focus and does little to discuss study weaknesses and strengths, to which the tail of the Results suggest there are multiple. For at the end of the Results and in the Methods we find additional sensitivity analyses and discussion points on a very strong enrichment for CpGs with a mean difference in methylation between the sexes (>1/3 of the 257), adjustments for genetic confounding and a high inflation factor in the discovery cohort.

    We have added an additional comment on the need for further functional analysis in cell and/or animal models at the end of our discussion on possible mechanisms underpinning the observed strong enrichment for sex effects at loci associated with periconceptional environment. We have performed an additional analysis of SoC effects on global methylation using predicted LINE1 and Alu element methylation to address the issue of genomic inflation in the discovery cohort (Methods ‘Inflation of test statistics’ and additional Supp. Fig. 14). We have commented on the potential for residual genetic confounding and the limitation of a lack of genetic data in the discovery cohort in the Discussion. We have also provided an additional comment on the potential influence of unmeasured inter-relatedness in our study population.

    Indeed, despite the strong and good flow of the Result section and the impressive (albeit somewhat one-side) look-up of SoC-CpGs in published datasets; the tail and Methods section leaves this reader with a strong suspicion of possible methodological issues on the measurement level already identified prior.

    The authors reports that the discovery cohort is biased in the collection of conception months (figure 2A), has a strong inflation of 1.3 (no QQ-plot is shown to assess bias in addition to inflation), no adjustment for genetic background could be made (which is false, as the 450k array contains several dedicated SNP probes, even hundreds when extracted with the omicsPrint package) and > 1/3 of SoC-CpGs is a sex CpG. For the latter observation the authors regressed out sex and repeated the analysis, noting no difference. However, regressing out sex does not help if sex is heavily correlated with confounding biological/sampling/technical covariates.

    The authors reason that the inflation is nothing to worry about citing single cohort studies on global effects on DNAm of methyl donors. Global DNAm is indeed often association with methyl donor intake but generally these studies investigate ALU or SINES repetitive elements and the PACE consortium reported only modest effects on select 450K array loci for prenatal folate supplementation, showing that their reasoning might hold on the ME loci (in/close to repetitive elements) but not the genome-wide analysis per se.

    The authors should convince the reader that their (discovery) data is valid. The data they do show in Supplemental tables 16 and 17 show that after functional normalization a strong effect of batches remains, while from my own experience these are normally nicely mitigated via functional normalization. Normally only strong cell type correlations remain in the first PCAs of the normalized data. But for ENID we see a remainder of sentrix row, often the strongest batch effect, and slide and plate remaining. Also, the biological, season and cohort specific variables are not noted here. We just must assume that the blank correction for the first 6 PCAs, rather than the actual adjustment for the measured batch/confounding effects, does not remove (or over adjusts) for biological/study design (village, genetic ancestry) effects. In addition to these observations figure 2C seems to indicate that the controls CpGs (elegantly selected by the authors) also show seasonal variation, just not as much as the SoC-CpGs. This leaves the reader to wonder: is there bias in their sample randomization across plates, rows and slides? This feeling is amplified by the fact that almost all SoC-CpGs seem to show an increase in DNAm in jul-aug (Suppl Fig. S5 and Figure 1B). [An observation that is not given enough prominence in the Results]. Which might or might not hint to a correlation with a batch effect (like sentrix row?).

    Our addition of a third longitudinal dataset with a very different confounding structure provides strong reassurance of the robustness of the reported SoC effects. However we recognise many of the concerns raised by the review and have therefore substantially extended our analysis of potential confounders in our analysis, including additional sensitivity analyses (see Supplementary Tables ST1p-1s).

    In our extended analysis of possible confounding of technical and biological covariates by SoC, we note that the majority of batch and biological covariates are categorical so that it was not possible to report correlation rho’s. We have instead reported p-values for corresponding association tests – see Supplementary Tables for further details of tests that were carried out. Also note that for simplicity season of conception is modelled as a binary variable (Dry: Jan-Jun; Rainy: July-Dec). We consider this to be a valid approximation to the main cosinor (Fourier) regression analysis since this showed a clear relationship between DNAm and dichotomised (Dry/Rainy) season of conception (Figs 2D & 4A). Note that we have not included month of collection as this completely confounds season of conception in the main ENID (2yr) analysis and cannot confound the EMPHASIS (7-9yr) analysis, as discussed in the manuscript (Fig. 2A). This is a key reason why we compared SoC effects across these two cohorts. Note that the month of collection also cannot confound the ENID 5-7yr (longitudinal) analysis as all samples are collected in the rainy season (additional Supp. Fig. 1B).

    The covariate correlation analysis confirms:

    • No correlation between SoC and all considered batch and biological covariates including principal components across all three analysed datasets (Supp Table, ST1p- 1r).

    • No correlation between sex and all considered batch and biological covariates; weak correlations with PC4 and PC3 in EMPHASIS and ENID 5-7yr datasets respectively (ST1q,1r); note also that the sex sensitivity analysis previously reported in the manuscript used methylation values that were pre-adjusted for sex using a regression model that included sex as the only adjustment covariate, alleviating concerns that there may be residual confounding due to strong correlations between technical/biological/sampling covariates and sex. We have added some additional comments on this to Results.

    • Expected strong correlations between SoC, month of conception and month of birth in all datasets (ST1p-1r).

    • Functional Normalisation (FN) removed most but not all of the effects of technical batch effects (sample plate, slide etc) from the DNAm array data used in the main ENID analysis (ST1p).

    • Samples are not perfectly randomised across 450k sample plate (month of birth [mob] and conception [moc]) and slide (mob and village) for the ENID 2yr cohort (ST1p).

    The last point raises the possibility of potential residual confounding due to array batch effects in the ENID analysis. We checked for this in two ways. First, we performed sensitivity analyses with batch and village ID variables included directly in the linear regression models, in addition to the PCs that served as proxies for batch variables in our original analysis. This suggested no residual confounding due to array batch or village ID effects (ST1s: ‘batch adjusted model’ and ‘village adjusted model’). Second, we confirmed that neither mob, moc nor village ID were associated with batch or any other covariates in the EMPHASIS or new ENID 5-7yr analyses (ST1q, ST1r). The tight correspondence of date of methylation maximum across all three datasets (cross-cohort and longitudinal analyses) (Figs. 2C, 3A and 4A) with different confounding structures (ST1p-1r) strongly suggests that the reported SoC associations are not driven by residual confounding.

    In summary, this analysis provides strong reassurance that our main analysis is not confounded by residual associations with technical and/or biological covariates considered in this analysis, and that the observed enrichment for previously identified sex-associations amongst SoC-CpGs is not driven by residual confounding due to sex.

    We have made multiple amendments to the manuscript to incorporate the longitudinal analysis; in the Introduction (lines 58-9); in the first section of Results; and we have made particular reference to the alignment of SoC effects across 3 datasets with different confounding structures. We have also amended several figure captions to distinguish the ENID 2yr and 5-7yr datasets and added the longitudinal dataset to Methods and to the study design schematic (revised Fig. 1), and visualised key results from this additional analysis in Figs. 3 and 4A. Finally we have added additional text on the sensitivity analyses in the main text and in Methods.

  2. Evaluation Summary:

    This manuscript will be of interest to researchers interested in the influence of prenatal exposures on infant health. The authors investigate the impact of the season of conception on child DNA methylation levels in two independent cohorts from the Gambia and identify a set of CpGs that are tightly regulated during development. The data support the main conclusions of the manuscript, but some of the analyses could be improved (i.e. possible presence of residual confounding). There is also limited evidence for the functional importance of the observed associations.

    (This preprint has been reviewed by eLife. We include the public reviews from the reviewers here; the authors also receive private feedback with suggested changes to the manuscript. Reviewer #1 agreed to share their name with the authors.)

  3. Reviewer #1 (Public Review):

    This manuscript by Silver, et al., details work investigating the relationship between season of conception and DNA methylation differences at sites across the genome, measured by widely-used arrays, in two cohorts of children using Fourier regression. They find that season of conception is associated with persistent methylation differences at several hundred CpG sites, and that these CpG are enriched for properties, compared to sets of control sites, that suggest that methylation at these sites is influenced very early in development/during conception and that these sites are positioned in genomic regions relevant for gene activation and regulation. Additional analyses investigated the effects of genetic variation of these sites, and found no evidence for single nucleotide polymorphisms nor child sex confounding the associations between season of conception and DNA methylation. As the number of sites measures by these arrays are a very small amount of total sites across the genome, the authors suggest that these findings indicate there may be many more sensitive methylation 'hotspots' in the genome that are not captured by these arrays but could impact on health/development.

    The key strengths of this manuscript include the use of two cohorts of children at different ages, providing evidence that these effects of season of conception appear to attenuate by 8-9 years of age; and comparison with control sites and additional analyses investigating confounding to build the evidence for these relationships reflecting true, biological associations rather than statistical artefacts or the result of confounding.

    However, the conclusions around the potential functional importance of these methylation differences are limited by a lack of evidence for a relationship between methylation of these season-of-conception-associated sites and child growth/development, so while this manuscript builds compelling evidence for the effects of season of conception on methylation, it's functional relevance is unclear. Additionally, there are some choices made in the analyses where the rationale for those choices should be made more clear, such as the use of CpG sites above or below a certain estimated effect size for different analyses.

    Overall, the approach taken here to demonstrate different levels of evidence for true relationships between early development exposures and differences in DNA methylation is a compelling one, and the manuscript delivers clear evidence for its primary conclusions.

  4. Reviewer #2 (Public Review):

    This is a very interesting manuscript, which will be of interest for a broader readership. The authors have analysed an unique cohort, which is of importance to understand the impact of environmental factors on DNA methylation.

    The performed analysis is well balanced, and the conclusions are justified by the presented data. It is a strength of this study, that results from the initial ENID study have been re-evaluated in the EMPHASIS study. Unfortunately, DNA methylation has been analysed using HM450 and EPIC arrays. Both methods are providing only a limited view on methylome-wide DNA methylation.

    Another limitation (as already addressed by the authors) is the lack of longitudinal samples. This would potentially have helped to gain further knowledge about the identified attenuation of DNA methylation levels at SoC associated CpGs.

    Finally, I am not entirely sure, that one confounding factor has been completely ruled out: It is known, that blood composition may cause methylation variability. In general, the authors addressed this point and analysed blood compositions (supplementary Figure 16) of both cohorts. Here, no marked seasonal differences between and within both cohorts have been identified. However, the participants of the EMPHASIS cohort have a very similar age (8-9 years). For this reason, I am wondering if methylation variability/ differences and in addition the attenuation of methylation levels might be influenced by the younger age of ENID participants compared to EMPHASIS study individuals.

  5. Reviewer #3 (Public Review):

    Silver et al. Investigate the influence of seasonal variation (nutrition, infection, environment) on blood DNA methylation in two cohorts of children (233 [2y] and 289 [8y-9y]) from the same sustenance farming communities in rural Gambia. One cohort (450K,233) was extensively studied before in multiple publications, the second dataset (850k,289) is unpublished. Using cosinor modeling they find 768 CpGs with a significant seasonal pattern(SoC-CpG, FDR<0.05) in the probes that overlap between the 450k and 850k arrays. Look-up of these 768 SoC-CpGs in the second sample showed 61 SoC-CpGs with FDR 0.05 (no mention is made if the direction of effect is consistent, but we assume it is so). The authors notice that most SoCs seem to be attenuated in the 8-9y sample. Then the authors select out of the 768 SoC-CpG the FDR<0.05 and >=4% seasonal amplitude in this discovery sample: 257 which they bring further in (enrichment) analyses. It is unclear if all 257 are (nominally) significant in the replication sample. These SoC-CpGs are enriched for imprinted and oocyte germline loci. Roughly 10% of SoC-CpGs overlap with so-called meta-stable epialleles (MEs), on which the authors have published greatly. This is a large fold enrichment, and subsequently the main focus of the Results and Discussion. Indeed, it skews the Discussion heavily and one wonders what could have been found in the other 90%? The Discussion is heavily geared to interpretation within their MEs focus and does little to discuss study weaknesses and strengths, to which the tail of the Results suggest there are multiple. For at the end of the Results and in the Methods we find additional sensitivity analyses and discussion points on a very strong enrichment for CpGs with a mean difference in methylation between the sexes (>1/3 of the 257), adjustments for genetic confounding and a high inflation factor in the discovery cohort.

    Indeed, despite the strong and good flow of the Result section and the impressive (albeit somewhat one-side) look-up of SoC-CpGs in published datasets; the tail and Methods section leaves this reader with a strong suspicion of possible methodological issues on the measurement level already identified prior.

    The authors reports that the discovery cohort is biased in the collection of conception months (figure 2A), has a strong inflation of 1.3 (no QQ-plot is shown to assess bias in addition to inflation), no adjustment for genetic background could be made (which is false, as the 450k array contains several dedicated SNP probes, even hundreds when extracted with the omicsPrint package) and > 1/3 of SoC-CpGs is a sex CpG. For the latter observation the authors regressed out sex and repeated the analysis, noting no difference. However, regressing out sex does not help if sex is heavily correlated with confounding biological/sampling/technical covariates.

    The authors reason that the inflation is nothing to worry about citing single cohort studies on global effects on DNAm of methyl donors. Global DNAm is indeed often association with methyl donor intake but generally these studies investigate ALU or SINES repetitive elements and the PACE consortium reported only modest effects on select 450K array loci for prenatal folate supplementation, showing that their reasoning might hold on the ME loci (in/close to repetitive elements) but not the genome-wide analysis per se.

    The authors should convince the reader that their (discovery) data is valid. The data they do show in Supplemental tables 16 and 17 show that after functional normalization a strong effect of batches remains, while from my own experience these are normally nicely mitigated via functional normalization. Normally only strong cell type correlations remain in the first PCAs of the normalized data. But for ENID we see a remainder of sentrix row, often the strongest batch effect, and slide and plate remaining. Also, the biological, season and cohort specific variables are not noted here. We just must assume that the blank correction for the first 6 PCAs, rather than the actual adjustment for the measured batch/confounding effects, does not remove (or over adjusts) for biological/study design (village, genetic ancestry) effects. In addition to these observations figure 2C seems to indicate that the controls CpGs (elegantly selected by the authors) also show seasonal variation, just not as much as the SoC-CpGs. This leaves the reader to wonder: is there bias in their sample randomization across plates, rows and slides? This feeling is amplified by the fact that almost all SoC-CpGs seem to show an increase in DNAm in jul-aug (Suppl Fig. S5 and Figure 1B). [An observation that is not given enough prominence in the Results]. Which might or might not hint to a correlation with a batch effect (like sentrix row?).