Multi-ancestry meta-analysis of host genetic susceptibility to tuberculosis identifies shared genetic architecture

Curation statements for this article:
  • Curated by eLife

    eLife logo

    eLife assessment

    This important manuscript, which describes the largest genetic association study to date, uses broadly compelling methods to address the genetic susceptibility to tuberculosis infection. A strength of the paper is that this multi-ancestry meta-analysis of genetic association studies than is more powerful than what has been done before. A weakness is that its main result is difficult to interpret due to the complexity of the genetic association signal.

This article has been Reviewed by the following groups

Read the full article See related articles

Abstract

The heritability of susceptibility to tuberculosis (TB) disease has been well recognized. Over 100 genes have been studied as candidates for TB susceptibility, and several variants were identified by genome-wide association studies (GWAS), but few replicate. We established the International Tuberculosis Host Genetics Consortium to perform a multi-ancestry meta-analysis of GWAS, including 14,153 cases and 19,536 controls of African, Asian, and European ancestry. Our analyses demonstrate a substantial degree of heritability (pooled polygenic h 2 = 26.3%, 95% CI 23.7–29.0%) for susceptibility to TB that is shared across ancestries, highlighting an important host genetic influence on disease. We identified one global host genetic correlate for TB at genome-wide significance (p<5 × 10 -8 ) in the human leukocyte antigen (HLA)-II region (rs28383206, p-value=5.2 × 10 -9 ) but failed to replicate variants previously associated with TB susceptibility. These data demonstrate the complex shared genetic architecture of susceptibility to TB and the importance of large-scale GWAS analysis across multiple ancestries experiencing different levels of infection pressure.

Article activity feed

  1. Author Response

    Reviewer #2 (Public Review):

    This manuscript tackles the important and vexing problem of mapping alleles for TB. It is a really important problem, and this paper presents the largest genetic data set. It does so by amalgamating data from multiple cohorts. The manuscript rightly points out that many studies have not produced reproducible results, and most alleles are population specific, and rarely seen in multiple studies.

    1. Authors find a strong HLA associated SNP. They do conduct HLA imputation, but there is little effective fine-mapping. Authors should report which classical alleles are consistent with this allelic association (e.g. which classical alleles are in phase with it). Authors comment on DQA1-0301, but it isn't clear in the main text how significant it is. I think the authors should dig a little deeper. Imputing amino acids and assessing association might be useful. Finding classical alleles that explain the SNP associations and are seen across populations might be useful. If the authors think that the SNP might be a regulatory allele, the authors should make a case for that based on genomic annotations, eQTL analyses etc.

    We thank the reviewer for pointing out the issues with the HLA section. We also received feedback from another reviewer about the HLA section. Based on this we have completely reworked the HLA section with more rigorous analysis to make the results easier to interpret and detect potential underlying HLA alleles that could explain the significant SNP detected in the MR-MEGA meta-analysis. This includes our findings with summary statistics for the DQA1*02:01 allele with those available from studies that were not included in our genome-wide meta-analysis. The HLA section has been updated on page 7-9, as shown below and a figure has been added to the main manuscript (Figure 3B) and the supplementary data (Figure S2):

    Notwithstanding inconsistency across populations the strongest signal in the combined global analyses is at DQA102:01, revealing a protective effect (OR 0.88, 95% CI 0.82-93, p-value = 1.3e-5, Figure 3B). The signal remains apparent in the six populations with the lead SNP at MAF >2.5% and individual level data available (p-value = 0.0003). However, conditioning on the significant SNP (rs28383206) in this subset, we find the signal at DQA102:01 all but disappears (Figure S2) suggesting the classical allele is tagging the rs28383206 association (p-value = 0.44). This observation is consistent with previous observations of HLA analysis in Icelandic (DQA102:01: OR 0.82, p-value = 7.39e-4) and Han Chinese populations (DQA102:01: OR 0.82, p-value = 7.39e-4), but showed opposite direction of effect in another Chinese population (DQA1*02:01: OR 1.28, p-value = 0.0193, Figure 3B)19,21,23.

    The discussion was also updated (page: 14-15) to incorporate and discuss the updated results as shown below:

    Based on the significant association, rs28383206, in the HLA region identified in this multi-ancestry (Figure 3A), HLA specific imputation and association testing was done to fine map the region and identify potential HLA epitopes driving this association. HLA DQA102:01 had the strongest signal in the meta-analysis across the 8 included studies (Figure 3B), but this signal disappeared when conditioning on the significant SNP (rs28383206). HLA DQA102:01 has previously been identified in an Icelandic and two Chinese population, but the direction of effect was not consistent19,21,23. Despite these inconsistencies the association between Mtb and HLA class II should be explored in more detail in future studies. A study investigating outcomes of Mtb exposure in individuals of African Ancestry identified protective effects of HLA class II alleles for individuals resistant to TB, highlighting the importance of HLA class II and susceptibility to TB62. HLA class II is a key determinant of the immune response in TB and Mtb has mechanisms to directly interfere with MHC class 2 antigen presentation63. This is supported by studies in mice, where mice in which the MHC class ll genes were deleted died quickly when exposed to Mtb and died faster than mice in which MHC class I genes were deleted63.

    1. The authors comment on ancestry. Are ancestry components disease associated in any cohort? It might be interesting to demonstrate this.

    We thank the reviewers for this recommendation. While ancestry components have been shown to be disease associated in the admixed (RSA) populations in previous studies, we have considered the fact that effects of genetic ancestry can be severely confounded by socioeconomic factors. Factors such as housing, employment, poverty and access to healthcare have significant impact on TB incidence rates, especially in African populations. We cannot account for these socioeconomic differences in our analysis, but we have updated the manuscript (page: 15) to highlight this issue and the potential impact of socioeconomic factors on our results.

    This is supported by the fact that previous TB genetic association studies have identified significant effects of ancestry on TB susceptibility11,26. However, the effects of genetic ancestry can be confounded by other factors not accounted for in this analysis, such as differences in socioeconomic factors (including differences in housing, employment, poverty, and access to healthcare) between the included study populations59–61. For the ancestry-specific analysis, fewer studies result in there being less input heterogeneity to account for, but the reduced sample size was not sufficient to detect any ancestry-specific genome-wide associations. This is particularly evident for the African ancestry-specific meta-analysis where the large degree of heterogeneity, which could be a result of the high genetic diversity within Africa, in combination with differences in socioeconomic factors compared to other populations included in this study, resulted in no observable suggestive association peaks59,60.

    Reviewer #3 (Public Review):

    This paper was a significant and commendable effort, given all the challenges in TB genetics research. It was generally well written and analyses well done. Analytical methods were appropriate. The inclusion of polygenic heritability estimates is also nice to have within this large work. There is also a wealth of supplemental data provided, which will be useful to the field.

    However, there are a number of important weaknesses that need to be addressed. These are listed here, and recommended revisions are addressed in the recommendations section:

    1. As the authors point out, one of the challenges in this work is the varying phenotype definitions (diagnosis of TB cases, definition of controls) across all the included genetic studies. Table S1 is critical for this, however it is missing information, and some of the information is unclear. More importantly, the authors state multiple times that there is no evidence of heterogeneity due to these variable phenotype definitions, and that genetic ancestry contributes more to differences in effect sizes between GWAS than study design. However, these two things are confounded - different study designs / phenotype definitions were used in studies of different ancestry.

    We thank the reviewer for pointing this out and we have updated Table S1 to define the phenotype definitions and how cases and controls were identified. All datasets should now have clear definitions. As for the impact of different phenotype definitions on the heterogeneity we do agree that these are confounding factors and we do not claim that there is no evidence of phenotype definitions influencing heterogeneity, but rather we claim that the genetic ancestry of the included populations has a larger impact on heterogeneity than other factors investigated in this study. We updated the manuscript to clarify this in the discussion (page: 15) as shown below:

    The p-values of residual heterogeneity in genetic effects between the studies in the multi-ancestry meta-analysis show no significant inflation between the studies suggesting that differences in study characteristics (phenotype definition, infection pressure, Mtb strain) are not the main contributor to the lack of significant associations, but they certainly have an impact and are compounded with ancestry-correlated heterogeneity and other factors. However, the ancestry-correlated heterogeneity p-values are generally lower than the residual heterogeneity, suggesting that genetic ancestry has a stronger impact on the differences in effects sizes between the studies. This is supported by the fact that previous TB genetic association studies have identified significant effects of ancestry on TB susceptibility11,26. However, the effects of genetic ancestry can be confounded by other factors not accounted for in this analysis, such as differences in socioeconomic factors (including differences in housing, employment, poverty, and access to healthcare), phenotype definitions and differences in infection pressure between the included study populations 60–62

    And we also updated the polygenic heritability in the results section (page: 4) as shown below:

    Furthermore, variations in phenotype definition can have an impact on heritability estimates (Table S1).

    1. The polygenic heritability analysis table is not explained very well.

    We thank the reviewer for pointing out this issue. The polygenic heritability table (Table S2) has been updated and some columns were removed (as they contained results from a discarded analysis). We have added footnotes to the table to define the variables and make the table more understandable and we have also updated the results section to clarify the analysis (page: 19) as shown below:

    The genetic relationship matrix was calculated for each autosomal chromosome (un-imputed data) which were pruned for SNPs in linkage disequilibrium (LD) using a 50 SNP window, sliding by 10 SNPs at a time and removing all variants with LD greater than 0.5.

    And page 19:

    Heritability estimations were transformed onto the liability scale using the GCTA software to account for the difference in the proportion of cases in the data compared to the population prevalence74.

    1. The supplemental data file is not very helpful without some sort of guide. It isn't clear whether the wealth of candidate genes that have been studied in TB were examined in these data. That would be a great benefit of this work.

    We thank the reviewer for pointing this out and we agree that the supplemental data excel sheet was difficult to understand. We have included a readme file (also on sheet 1 of the excel sheet) to explain which information is in the sheets of the excel document. This includes a list of candidate SNPs and genes that we investigated along with the meta-analysis results of these candidate SNPs and genes. We also updated the “Prior associations” section of the manuscript in which we cover the results of candidate SNPs and genes (page: 13-14).

    1. There needs to be clarity on how unpublished works were sought. In non-genetic meta-analyses, there is usually some detail about a process of contacting authors, etc. There needs to be some assurance that every attempt was made to collect all the relevant data. It is also not clear why family-based analyses could not be included considering that summary statistics were the basis of analysis.

    I updated the manuscript to address this (page: 18):

    This analysis includes 12 of the 17 published (and un-published, Table 1 and S1) GWAS studies of TB (with HIV negative cohorts) prior to 202210–17,26. For unpublished works we contacted researchers that were funded for genetic TB research and acquired data sharing agreements to obtain summary statistics (or raw data) along with any meta-data that was available. It excludes data from Iceland and Vietnam 18,31, as they declined to share data. It excludes data from China, Korea, Peru and Japan6,20,21,23,31, as data sharing agreements could not be finalized in time for this analysis. The Indonesian and Moroccan data were to sparsely genotyped and not suitable for reliable imputation and the Moroccan data was also family-based and thus also not suitable for this meta-analysis, as this would introduce confounding effects from the inclusion of related individuals24,25.

    1. It is rather surprising that only one locus meets genome-wide significance. The authors do explain this well in terms of the ancestry-specific effects driving these results, but it is also surprising that no candidate genes (that had not been discovered in GWAS studies, but were rather studied separately) did not rise to some higher significance threshold.

    We agree that it is surprising that we did not detect more significant associations and failed to replicate any candidate SNPs or genes at a genome wide significance level. We aim to have future iterations of this analysis with more data to increase power to detect more variants of interest, but this is beyond the scope of this manuscript.

  2. eLife assessment

    This important manuscript, which describes the largest genetic association study to date, uses broadly compelling methods to address the genetic susceptibility to tuberculosis infection. A strength of the paper is that this multi-ancestry meta-analysis of genetic association studies than is more powerful than what has been done before. A weakness is that its main result is difficult to interpret due to the complexity of the genetic association signal.

  3. Reviewer #1 (Public Review):

    The manuscript describes a multi-ancestry meta-analysis of genome-wide association studies of tuberculosis risk from case-control cohorts across several European, Asian, and African countries.

    A main finding is that there is substantial common variant heritability of tuberculosis risk is well established. However, this analysis needs to be adjusted for differing case-control ratios in order to put the heritability estimates onto the liability scale so that variation across countries/cohorts can be properly assessed.

    The authors find the strongest statistical evidence for association at a HLA locus. However, because of the complexity of this region and the diversity across ancestries, interpretation of this association is difficult.

    This manuscript shows that there is potential to identify heritable sources of tuberculosis risk across ancestries. However, better genotyping of the HLA region and larger sample sizes will be needed to make further progress.

  4. Reviewer #2 (Public Review):

    This manuscript tackles the important and vexing problem of mapping alleles for TB. It is a really important problem, and this paper presents the largest genetic data set. It does so by amalgamating data from multiple cohorts. The manuscript rightly points out that many studies have not produced reproducible results, and most alleles are population specific, and rarely seen in multiple studies.

    1. Authors find a strong HLA associated SNP. They do conduct HLA imputation, but there is little effective fine-mapping. Authors should report which classical alleles are consistent with this allelic association (e.g. which classical alleles are in phase with it). Authors comment on DQA1-0301, but it isn't clear in the main text how significant it is. I think the authors should dig a little deeper. Imputing amino acids and assessing association might be useful. Finding classical alleles that explain the SNP associations and are seen across populations might be useful. If the authors think that the SNP might be a regulatory allele, the authors should make a case for that based on genomic annotations, eQTL analyses etc.
    2. The authors comment on ancestry. Are ancestry components disease associated in any cohort? It might be interesting to demonstrate this.

  5. Reviewer #3 (Public Review):

    This paper was a significant and commendable effort, given all the challenges in TB genetics research. It was generally well written and analyses well done. Analytical methods were appropriate. The inclusion of polygenic heritability estimates is also nice to have within this large work. There is also a wealth of supplemental data provided, which will be useful to the field.

    However, there are a number of important weaknesses that need to be addressed. These are listed here, and recommended revisions are addressed in the recommendations section:
    1. As the authors point out, one of the challenges in this work is the varying phenotype definitions (diagnosis of TB cases, definition of controls) across all the included genetic studies. Table S1 is critical for this, however it is missing information, and some of the information is unclear. More importantly, the authors state multiple times that there is no evidence of heterogeneity due to these variable phenotype definitions, and that genetic ancestry contributes more to differences in effect sizes between GWAS than study design. However, these two things are confounded - different study designs / phenotype definitions were used in studies of different ancestry.
    2. The polygenic heritability analysis table is not explained very well.
    3. The supplemental data file is not very helpful without some sort of guide. It isn't clear whether the wealth of candidate genes that have been studied in TB were examined in these data. That would be a great benefit of this work.
    4. There needs to be clarity on how unpublished works were sought. In non-genetic meta-analyses, there is usually some detail about a process of contacting authors, etc. There needs to be some assurance that every attempt was made to collect all the relevant data. It is also not clear why family-based analyses could not be included considering that summary statistics were the basis of analysis.
    5. It is rather surprising that only one locus meets genome-wide significance. The authors do explain this well in terms of the ancestry-specific effects driving these results, but it is also surprising that no candidate genes (that had not been discovered in GWAS studies, but were rather studied separately) did not rise to some higher significance threshold.