Diverse ancestry whole-genome sequencing association study identifies TBX5 and PTK7 as susceptibility genes for posterior urethral valves

Curation statements for this article:
  • Curated by eLife

    eLife logo

    Evaluation Summary:

    Prior work has linked posterior urethral valves (PUV), a common cause of end stage renal disease in children, with chromosomal abnormalities and rare copy number variants, but the genetic causes of PUV remain incompletely defined. In this study, the authors have used diverse ancestry whole-genome sequencing association studies to identify two novel genes and an enrichment of rare duplications and inversions affecting candidate cis-regulatory elements as possible causes of this rare condition, illustrating the potential for this approach to other rare conditions.

    (This preprint has been reviewed by eLife. We include the public reviews from the reviewers here; the authors also receive private feedback with suggested changes to the manuscript. Reviewer #3 agreed to share their name with the authors.)

This article has been Reviewed by the following groups

Read the full article See related articles

Abstract

Posterior urethral valves (PUV) are the commonest cause of end-stage renal disease in children, but the genetic architecture of this rare disorder remains unknown. We performed a sequencing-based genome-wide association study (seqGWAS) in 132 unrelated male PUV cases and 23,727 controls of diverse ancestry, identifying statistically significant associations with common variants at 12q24.21 (p=7.8 × 10 −12 ; OR 0.4) and rare variants at 6p21.1 (p=2.0 × 10 -8 ; OR 7.2), that were replicated in an independent European cohort of 395 cases and 4151 controls. Fine mapping and functional genomic data mapped these loci to the transcription factor TBX5 and planar cell polarity gene PTK7 , respectively, the encoded proteins of which were detected in the developing urinary tract of human embryos. We also observed enrichment of rare structural variation intersecting with candidate cis -regulatory elements, particularly inversions predicted to affect chromatin looping (p=3.1 × 10 -5 ). These findings represent the first robust genetic associations of PUV, providing novel insights into the underlying biology of this poorly understood disorder and demonstrate how a diverse ancestry seqGWAS can be used for disease locus discovery in a rare disease.

Article activity feed

  1. Evaluation Summary:

    Prior work has linked posterior urethral valves (PUV), a common cause of end stage renal disease in children, with chromosomal abnormalities and rare copy number variants, but the genetic causes of PUV remain incompletely defined. In this study, the authors have used diverse ancestry whole-genome sequencing association studies to identify two novel genes and an enrichment of rare duplications and inversions affecting candidate cis-regulatory elements as possible causes of this rare condition, illustrating the potential for this approach to other rare conditions.

    (This preprint has been reviewed by eLife. We include the public reviews from the reviewers here; the authors also receive private feedback with suggested changes to the manuscript. Reviewer #3 agreed to share their name with the authors.)

  2. Reviewer #1 (Public Review):

    The congenital condition posterior urethral valves (PUV) is a major cause of end stage renal disease in young males. While prior work has partially characterized the genetic landscape of this condition, Its pathogenesis remains poorly understood so any new insights will be of broad interest to pediatric nephrologists, urologists, geneticists and developmental biologists. The study by Chan et al makes a significant contribution to this story. Here, the authors have used for the first time a diverse ancestry whole-genome sequencing approach to tackle the problem and have identified variants within/near two genes, TBX5 and PTK7, as being significantly associated with this condition in both their original cohort as well in a replication study. The data are compelling and are a good example of the power of applying a diverse ancestry approach to disease locus discovery in rare disease. They also were able to use this approach to fine map variants inferred to be causal. This study's use of WGS provided other advantages: they could identify rare exonic variants and small structural variants missed by conventional microarrays. This strategy resulted in two additional observations: 1) none of the genes previously associated with congenital bladder outflow obstruction were associated with PUVs, showing that monogenic causes of PUVs are rare; 2) an enrichment in PUV cases of rare inversions affecting candidate cis-regulatory elements, with the strongest signal for inversions affecting CTCF-only regions.

    While the genetic associations appear robust, there are a number of weaknesses to this study. The most obvious and important one is that all of the findings are associative, and none are experimentally validated. The authors nicely use bioinformatic methods to show that the variant near TBX5 may map into the same topologically associated domain, but they provide no direct evidence that this variant directly affects TBX5 expression. The closest they come to providing any link is by showing possibly permissive expression of TBX5 in relevant tissues. Likewise, they suggest that the intronic variant in PTK7 may disrupt the binding domain for at least two transcription factors, though neither is experimentally evaluated, and they provide no direct evidence showing that this variant affects the expression of PTK7. It should also be noted that their immunohistochemical studies of human fetal tissue for TBX5 and PTK7 are not convincing. There appears to be widespread staining of multiple cell types, suggesting either very broad expression of both genes or poor specificity of the primary antibodies. There is, of course, no reason that a broadly expressed gene cannot have organ or tissue-specific effects when its activity is altered, but these data do not provide compelling evidence that either TBX5 or PTK7 is functionally important in this condition. Further highlighting the importance of this issue, PUVs have not been described as a clinical manifestation of disease associated with mutations of either gene in humans. Finally, it would be useful for the authors to discuss how variants in either gene or in the patterns of structural variants that they found associated with PUV intersect with sex to result in this exclusively male condition.

  3. Reviewer #2 (Public Review):

    Novel aspects include a new strong genetic signal at the TBX5 locus encoding a plausible candidate gene for PUV, and a weaker signal for a rare allele at the PTK7 locus. Some of the limitations include the fact that the study was severely underpowered for detection of rare variant associations, the signal at the PTK7 locus is barely meeting the significance threshold (may still represent a false positive given the number of multiple tests and the presence of genomic inflation), and the provided replication study is suboptimal given differences in the genotyping between cases and controls, imbalanced design, and lack of ancestry matching and adjustment. Beyond protein expression studies in human embryos, there are no additional experimental studies provided in support of the causal role of TBX5 and PTK7 genes in susceptibility to PUV.

    Major:

    1. The replication study is problematic given that different genotyping methods are used for cases (targeted KASP) versus controls (WGS). This may introduce differential bias. Moreover, the ancestry of the control cohort (UK-based) does not seem to be well matched to the cases (predominantly German and Polish), and the lack of genome-wide data for the cases precludes proper adjustment for population stratification. The case-control design is also imbalanced in the replication study. The authors should reconsider their replication strategy to include a more balanced cohort with ancestry-matched controls and uniform genotyping. As an alternative, genome-wide genotyping of the replication case cohort would significantly enhance the study, and should be considered.

    2. I am reassured that the TBX5 signal remains genome-wide significant in European-only analysis. However, the signal at PTK7 appears much less robust - it has borderline statistical significance (especially given that the authors test for all rare and common variants across the genome) and is represented by a single variant with a relatively rare risk allele that is differentially distributed by ancestry. Therefore, I would like to see more information for this specific signal:

    Information on the depth of coverage and the quality of the top variant

    Information if the top PTK7 variant remain genome-wide significant after application of genomic control. Of note, the calculation of genomic inflation is dependent on sample size - lambda of 1.05 may represent an underestimate given low power of the cohort, and this point deserves at least a comment. Some methods correcting lambda for sample size have been proposed, and the authors should consider applying these methods.

    This locus requires more robust replication as discussed above. If more robust replication study is not possible, additional functional studies could provide more evidence in support of this locus.

    3. There is no validation of sensitivity and specificity of SV detection by variant size or type (e.g. inversions, deletions, duplications). Also, since burden differences are not replicated independently, the authors should stress the exploratory nature of these analyses.

    4. In the discussion (especially second paragraph, but also throughout), the authors overemphasize multi-ancestry nature of their study. The reality is that the included non-Europeans are very small in numbers (18 SAS cases, 11 AFR cases, and 14 admixed cases). I would suggest for the authors to specifically state these case counts and make it clear that expanded efforts to recruit non-Europeans are still needed given these very low numbers. Supplemental figure 2 -provide case-control counts in each ancestral group (Y axis). Supplemental figure 3 is misleading since allelic frequencies in the cases are pooled and are not available individually for all depicted populations.

    5. I did not see details of chr. X analysis. This is important given that the case group involves only Males and control group involves both Males and Females. Also, please explain how sex was used as a fixed effect (as stated in the methods) given that the case cohort is 100% male.

  4. Reviewer #3 (Public Review):

    In this manuscript, the authors attempt to identify risk factors for PUV, a rare disease with unclear pathophysiology. The study design is a well-designed GWAS, although performed on sequence data rather than SNP array data with imputation; the sequence data also allows for study of structural variants. Strengths of the study include an exemplary design and analytical approach, as well as the novelty of applying a GWAS to a rare disease. Weaknesses include a somewhat thin exposition as to what is known and unknown about the genetic architecture of PUV, some omitted analyses that could further elucidate the genetic basis of PUV, and some results in the latter half of the manuscript that have unclear impact.

    I believe that the primary objective of the study was achieved -- the reported genes have reasonable evidence as candidate genes and the association signals nearby them seem to be robust. I am not familiar with PUV but if these are some of the first genes identified for the disease, they may have a significant impact on the PUV research field. They do face the same limitations of any gene identified from a GWAS, however, in that the evidence implicating them in PUV is still circumstantial, and there is a long way to go to demonstrate the mechanism linking them to disease or whether they or other genes in the same pathway could be targeted by therapeutics.

    More generally, while the GWAS methodology applied is not particularly novel, the scenario of applying it to a rare disease is innovative and of value -- as we become increasingly aware that the dividing line between rare and common diseases may be blurry, GWAS for rare disease (and, conversely, sequencing studies for common disease) are important data points for advancing the field. Rare diseases are traditionally studied through very different approaches than are common diseases, so bringing rigorous statistics and analytical approaches to a rare disease is of value to the field.

  5. Author Response

    Reviewer #1 (Public Review):

    It should also be noted that their immunohistochemical studies of human fetal tissue for TBX5 and PTK7 are not convincing. There appears to be widespread staining of multiple cell types, suggesting either very broad expression of both genes or poor specificity of the primary antibodies.

    We appreciate the reviewer’s comment that the immunohistochemistry staining does not provide definitive evidence for the functional importance of TBX5 and PTK7 in PUV, however these images do confirm that the proteins are ‘in the right place at the right time’ during normal human urinary tract development. We have updated the discussion on page 19, line 441-445 to emphasise this. To further support a putative role for these proteins in urinary tract development we have added additional images from a second human embryo at the same gestation which confirms these distinct patterns of staining (Figure 8 – figure supplement 1 on page 14, lines 313-317). Even if these proteins can also be detected in other tissues or cell types, this does not detract from this idea, as in other locations the proteins may have redundant or different roles.

    PUVs have not been described as a clinical manifestation of disease associated with mutations of either gene in humans.

    The reviewer is correct that rare variants affecting TBX5 and PTK7 have not previously been associated with PUV. They have however been associated with other developmental anomalies (as stated in the discussion on page 18, line 408-411 and page 19, line 434-437) confirming a clear role in embryonic development for both these genes.

    The fact that rare variant association testing did not identify an increased burden of rare, likely deleterious variants in these two genes (although with limited power in this cohort) suggests that PUV is not driven by ultra-rare, highly penetrant alleles in these genes. However, the identification of common and low-frequency variants using GWAS suggests a complex mode of inheritance for PUV likely in combination with maternal_/in utero_ factors. As with other complex traits, these signals provide potential insights into the underlying biology of this disease as opposed to the diagnostic implications of conventional monogenic gene discovery associated with purely Mendelian conditions. A paragraph on the Mendelian/complex trait implications of the findings of the study has been incorporated into the discussion (page 21-22, line 594-502).

    Discuss how variants in either gene or in the patterns of structural variants that they found associated with PUV intersect with sex to result in this exclusively male condition.

    The fact that PUV is a uniquely male disease is most likely the result of differences in urethra and bladder development and length differences in urethra between males and females. Sex hormones may also potentially result in tissue-specific differences in gene expression (Ober, Loisel, and Gilad 2008). We have added a paragraph into the discussion to clarify this (page 20, line 454-463) as well as clarified the results of the chromosome X and sex-specific analyses (page 7, lines 149-155; see also Reviewer 2, point 5 below) as suggested.

    Reviewer #2 (Public Review):

    Major:

    1. The replication study is problematic given that different genotyping methods are used for cases (targeted KASP) versus controls (WGS). This may introduce differential bias. Moreover, the ancestry of the control cohort (UK-based) does not seem to be well matched to the cases (predominantly German and Polish), and the lack of genome-wide data for the cases precludes proper adjustment for population stratification. The case-control design is also imbalanced in the replication study. The authors should reconsider their replication strategy to include a more balanced cohort with ancestry-matched controls and uniform genotyping. As an alternative, genome-wide genotyping of the replication case cohort would significantly enhance the study and should be considered.

    Many thanks to the reviewer for their valuable comments regarding the replication study case-control cohort. While different sequencing technologies were used to compare allele counts at the lead variants in the replication study (KASP genotyping for cases vs WGS for controls), both techniques exhibit > 99.5% accuracy and are subjected to variant level quality control metrics. Only individuals with reliably called genotypes were included in the replication analysis. This has been clarified in the methods section (page 30, line 693).

    We were able to obtain genome-wide genotyping data for 204 of the 395 European cases in the replication cohort. While (despite sustained effort on our part) we were unable to analyze this data jointly with the control cohort in the 100KGP due to enforced limitations on data sharing, we were able to demonstrate similar ancestry of the replication study cases and controls: we performed PCA on a set of ~80,000 overlapping autosomal, high-quality, LD-pruned variants with MAF > 10% and projected the cases and controls separately onto (the same) data from the 1000 Genomes Project (Phase 3) labelled by ‘population’ (Figure 5). This clearly demonstrates that both cohorts have homogeneous European ancestry, as stated now in the results (page 8, lines 166-168).

    We note with thanks the reviewer’s comments regarding the case-control imbalance in the replication study which can sometimes result in a type 1 error. To address this, the case control ratio was reduced from 1:27 to 1:10.5 by including only the 4,151 male controls from the cancer cohort of the 100KGP. The results remained significant for both lead variants and have been updated in the manuscript (page 8, line 162-176; Table 2).

    When the number of controls was reduced to 500 males (a case:control ratio of 1:1.3), rs10774740 (TBX5 locus) remained significant demonstrating that case-control imbalance was not driving the observed signal (P=9.9x10-3; OR 0.77; 95% CI 0.63-0.94). rs144171242 (PTK7 locus) however did not reach significance due to insufficient power (P=0.06; OR 2.24; 95% CI 0.93-5.36). For a rare variant such as rs144171242 (MAF ~ 1%), a replication study with 500 controls is only powered to detect association with large effect size (OR > 3.5). A case:control ratio of ~1:10 is therefore optimal to maximize power to detect association, while minimizing unnecessary noise from excess controls. This has been added to the results section of the manuscript (page 8-9, lines 178-184).

    2. I am reassured that the TBX5 signal remains genome-wide significant in European-only analysis. However, the signal at PTK7 appears much less robust - it has borderline statistical significance (especially given that the authors test for all rare and common variants across the genome) and is represented by a single variant with a relatively rare risk allele that is differentially distributed by ancestry. Therefore, I would like to see more information for this specific signal:

    Information on the depth of coverage and the quality of the top variant

    This has been incorporated into the manuscript for both lead variants (Page 7, lines 142-145). For rs144171242 at the PTK7 locus, the meanDP was 29.34 and the meanGQ was 75.59.

    Information if the top PTK7 variant remain genome-wide significant after application of genomic control. Of note, the calculation of genomic inflation is dependent on sample size - lambda of 1.05 may represent an underestimate given low power of the cohort, and this point deserves at least a comment. Some methods correcting lambda for sample size have been proposed, and the authors should consider applying these methods.

    We appreciate the reviewer’s comments that the value of lambda may be affected by sample size and have added a comment to this in the manuscript (Page 7, line 136-137). Despite extensive searching, we were unable to find a recent published example of how to correct lambda for sample size and would be grateful if the reviewer could suggest a reference for this.

    To answer the reviewer’s specific question, application of genomic control to the lead variant at PTK7 results in P=4.37x10-8 which remains below the threshold for conventional genome-wide significance. However, while the genomic inflation factor provides a reasonable indication of possible confounding by population structure, there are recognized limitations to applying it as a corrective factor as it assumes that all variants are confounded i.e., the same correction is applied irrespective of differences in population allele frequency which can be insufficient for some variants and lead to a loss of power in others. Furthermore, in addition to sample size, lambda can vary with heritability and disease prevalence (Yang et al. 2011) and its use for correction can therefore be too conservative and reduce power to detect significant associations. In this manuscript we therefore chose to use the mixed model approach (as part of SAIGE – detailed in the methods on page 28, lines 647-648), which has largely superseded older methods such as genomic control, to robustly correct for both population structure and cryptic relatedness and minimize false positive associations (Shin and Lee 2015).

    This locus requires more robust replication as discussed above. If more robust replication study is not possible, additional functional studies could provide more evidence in support of this locus.

    Please refer to point 1 regarding the revised and more robust evidence of replication.

    3. There is no validation of sensitivity and specificity of SV detection by variant size or type (e.g. inversions, deletions, duplications). Also, since burden differences are not replicated independently, the authors should stress the exploratory nature of these analyses.

    We appreciate the reviewer’s comment that there is no independent validation of SV detection (e.g., by microarray or long-read sequencing) and this was reported as a limitation of our study in the discussion (page 22-23, line 520-524). However, one of the main strengths of this study is the use of clinical-grade WGS data where all samples have been sequenced on the same platform and undergone variant calling using the same bioinformatics pipeline. This essentially eliminates confounding due to differences in data generation and processing and the sensitivity and specificity of SV detection will therefore be the same for both cases and controls.

    We agree with the reviewer that the SV analyses have not yet been replicated independently and, as they suggest, have stressed the exploratory nature of the findings in the discussion (page 21, line 491-493).

    In the discussion (especially second paragraph, but also throughout), the authors overemphasize multi-ancestry nature of their study. The reality is that the included non-Europeans are very small in numbers (18 SAS cases, 11 AFR cases, and 14 admixed cases). I would suggest for the authors to specifically state these case counts and make it clear that expanded efforts to recruit non-Europeans are still needed given these very low numbers.

    We appreciate the reviewer’s comment about the overemphasis on the multi-ancestry nature of the study and the small absolute numbers of individuals included, however as a proportion of the cohort, a third of the cases are non-European: 14% are of South Asian ancestry, 8% are of African ancestry and 11% are admixed. This breakdown comprises a greater proportion of non-white European ancestry individuals than the UK as a whole (DOI: 10.5257/census/aggregate-2001-2), where the discovery cohort was based. This provides evidence that our study eliminates at least some of the Euro-centric bias present in existing genetic and genomic literature, at least as far as the UK population is concerned. Clearly, global studies fairly representing all populations would be needed to address this issue perfectly. The case counts were reported in Table 1 but we have now referenced the low absolute numbers and included the reviewer’s suggestion about expanding efforts to recruit non-European populations in the main text (page 22, line 518-520). We have also edited paragraph two of the discussion in response to the reviewer’s comments (page 17, line 387-398).

    Supplemental figure 2 -provide case-control counts in each ancestral group (Y axis).

    These have been added to the figure legend of Figure 6 – supplemental figure 4 (previously Figure 5 - supplemental figure 2).

    Supplemental figure 3 is misleading since allelic frequencies in the cases are pooled and are not available individually for all depicted populations.

    Figure 5 - supplemental figure 3 has been removed and replaced by Figure 6 – supplemental figure 3 to show only the individual case, control and gnomAD AF by ancestry for AFR, SAS and EUR population groups instead of using the pooled allele frequencies.

    5. I did not see details of chr. X analysis. This is important given that the case group involves only Males and control group involves both Males and Females. Also, please explain how sex was used as a fixed effect (as stated in the methods) given that the case cohort is 100% male.

    We thank the reviewer for their insightful comments. Sex was used as a covariate (or fixed effect) to control for the anatomical differences in development of the urethra (and in utero hormonal changes) between the sexes in the control cohort (clarified in the methods, page 28, lines 651-653). Given the PheWAS findings (page 13, line 292-297) reveal an association between the lead variant near TBX5 and female genital prolapse and urinary incontinence, this suggests that while women do not develop PUV (due to differences in urethral development) they may manifest other lower urinary tract phenotypes. In theory, removing the female individuals from the control cohort should therefore strengthen the association as the signal would not be diluted by ‘affected’ women (i.e., those with potentially unknown lower urinary tract phenotypes). We tested this by performing a sex-specific male-only GWAS and found that the strength of association at both lead variants increased. The results of this have been added to the manuscript (page 7, line 149-155).

    The results of the chromosome X rare variant analysis are shown on the Manhattan plot (Figure 9), with no significant genes identified. We have added chromosome X to the mixed-ancestry and European GWAS as suggested (with no significant results) and the Manhattan and Q-Q plots have been updated in Figure 2 and Figure 6. The number of analyzed variants in each analysis has also been updated accordingly.