Ancient Trans-Species Polymorphism at the Major Histocompatibility Complex in Primates

    This paper makes a valuable contribution to the area of balancing selection at the Major histocompatibility complex (MHC), including trans-species polymorphism between humans and other primates, by incorporating a large evolutionary range of species and genes and by using newer methodological approaches to characterize the depth and extent of the trans-species polymorphism across an expanded range of primate taxa. While the presented results solidly support the authors' conclusions, additional analyses would be needed to firmly exclude modes of evolution that could mimic trans-specific polymorphism.

Classical genes within the Major Histocompatibility Complex (MHC) are responsible for peptide presentation to T cells, thus playing a central role in immune defense against pathogens. These genes are subject to strong selective pressures including both balancing and directional selection, resulting in exceptional genetic diversity—thousands of alleles per gene in humans. Moreover, some alleles appear to be shared between primate species, a phenomenon known as trans-species polymorphism (TSP) or incomplete lineage sorting, which is rare in the genome overall. However, despite the clinical and evolutionary importance of MHC diversity, we currently lack a full picture of primate MHC evolution. To start addressing this gap, we explore variation across genes and species in our companion paper Fortier2024a and here we explore variation within individual genes. We used Bayesian phylogenetic methods to determine the extent of TSP at 17 MHC genes, including classical and non-classical Class I and Class II genes. We find strong support for deep TSP in 7 of 10 classical genes, including—remarkably—between humans and old-world monkeys in MHC-DQB1. Despite the long-term persistence of ancient lineages, we additionally observe rapid evolution at nucleotides encoding the proteins' peptide-binding domains. The most rapidly-evolving amino acid positions are extremely enriched for autoimmune and infectious disease associations. Together, these results suggest complex selective forces—arising from differential peptide binding—that drive short-term allelic turnover within lineages while also maintaining deeply divergent lineages for at least 31 million years.

  2. Reviewer #1 (Public Review):

    HLA genes have long been known to harbor trans-species polymorphism (TSP). This manuscript aimed to use state-of-the-art analyses and updated genotyping data to rigorously test for the presence of TSP in HLA genes, quantify the timescales associated with HLA TSP, and relate HLA disease associations to evolutionary rates. To do this, the authors chose HLA alleles across great apes, old world monkeys, and new world monkeys on which to perform phylogenetic analyses, alongside non-parametric tests that compare patterns of synonymous diversity. Finally, HLA genetic associations with the disease were correlated with evolutionary rate.


    The manuscript is well written and neatly organized, the figures are clear, and there are many supplementary analyses that will make this paper a great resource for MHC phylogenetics at allelic resolution.

    Deployment of modern methodology such as BEAST2 can also test if the hypothesis of TSP is supported while accounting for uncertainties in tree topology and evolutionary rates, necessary additions to analyses of the MHC.


    Because TSP has already been convincingly demonstrated to occur in the MHC, the primary benefit of the current study is to ensure these previous observations are still supported by the wealth of genetic data that is now available and modern phylogenetic approaches. However, the benefit of using the robust BEAST2 method comes with the weakness of not using all available data. Focusing on single gene trees with only a small subset of alleles may bias results, and inclusion/exclusion criteria should be better defined.

    One major point that is somewhat overlooked is the presence of multiple copy numbers for the MHC genes through classic birth and death evolution. For example, MHC-B in new world monkeys is duplicated many times (up to 10; PMID: 23715823). This duplication is naturally accompanied by gene loss and pseudogene formation. All of these things muddy the waters considerably yet are not addressed here. A good example is MHC-A, where it has been very difficult to apportion orthologs, even amongst closely related species, due to alternative or incomplete duplication/loss across the species, or region configuration polymorphism (e.g. PMID: 26371256). An example is chimpanzee Patr-AL which shares similarities with human HLA-A*02 lineage, but is a separate locus, could this show up as TSP under the current analysis?

    Similarly, an alternative hypothesis for TSP is convergent gene conversion mutations: intergenic gene conversion has been repeatedly observed in HLA genes and the possibility of it occurring with the same two genes becomes more realistic over 45 million years. If the same two MHC genes recombined in humans and in an NWM, each on their own lineages, this would appear as TSP and would cause an overlap of pairwise synonymous divergence between human-human and human-NWM allele comparisons. This might be especially possible in MHC-DR and MHC-DQ genes presented in Figure 2 since both humans and NWM have multiple MHC-DRB and DQB genes (unless e.g. were genes besides HLA/MHC-DRB1 such as DRB3,4,5 included in the DRB phylogenies?). While BEAST2 may be a good way of robustly modeling and identifying TSP, and I understand these analyses cannot support many more sequences, the authors should consider adding an analysis that rules out gene conversion as an explanation for their results (especially the often repeated claim of 45 million year TSP). For example, can the authors use BLAST to ensure that the alleles that underlie 45 million years of TSP do not share close similarities to other HLA genes present in their respective human and NWM genomes? This seems like it could be fairly quickly performed for all genes, and even if it argued against TSP, it would be an interesting finding.

    Finally, the authors have limited themselves to a small subset of HLA/MHC alleles and do not provide sufficient information in the methods to understand how these were chosen nor sufficient discussion surrounding how inclusion/exclusion criteria could bias results. For example, the authors say the alleles were chosen at 2-digit (i.e. 1 field) resolution, but in the phylogenies of Fig. 2, I see variable numbers of alleles chosen for each 2-digit allele family - what metric was used to decide on these alleles?

    "We also collected associations between amino acids and TCR phenotypes". It is not clear either what was analyzed, or the results for this part of the analysis. This is a topic of much debate and none of the previous work has been discussed (PMID: 18304006, PMID: 29636542 as primers for this contentious subject)

    MHC class I also interact with NK cell receptors, including polymorphic KIR. Through their interactions during infection control and reproduction, the two complexes co-evolve across primates, contributing to the maintenance of MHC diversity. Interaction with KIR likely has a greater impact on HLA polymorphism than interactions with TCR, yet this is not factored into any of the models, or indeed mentioned in the text.

    One additional reason inclusion of the KIR binding is important relates to the point above about gene conversion, where it is established that gene conversion reproducibly swaps KIR-binding motifs among MHC class I alleles and genes. HLA-A*23, *24, and *25, *32, for example, are characterized by the acquisition of the 'Bw4' motif from HLA-B (PMID: 26284483), likely followed by positive natural selection. For exon 2 (which encodes the motif), these alleles turn up in a clade distinct from other human HLA-A (Fig 2-S1). What is the impact of the Bw4 motif on this phylogeny? Could this shuffling of motifs be interpreted as indirect TSP?

    The analysis that shows the most rapidly evolving sites occur in the peptide binding domain brings little new to the field. This has been established by the Hughes and Nei (cited) and Parham, Lawlor, etc of 1988 (e.g. PMID: 3375250), and replicated multiple times across human populations and many other species.
    Likewise, the disease association part. It is nice to have a summary of the known associations, but there are others out there and this one is far from thorough. Here, 50% of the information about infectious diseases appears to be taken from one reference, leaving out some major bodies of work; for example identifying specific peptide binding residues or peptides that associate with HIV (PMID: 22896606) or malaria control (PMID: 1280333). It is also missing some major concepts -such as the DRB1 'shared epitope' of peptide binding residues that predispose to Rheumatoid Arthritis and protects from Parkinson's disease (35 years of work from PMID: 2446635 through PMID: 30910980). The nasopharyngeal carcinoma and EBV story (e.g. PMID: 23209447). Another huge gap here is the pregnancy syndromes -associations of specific HLA C and NK cell receptor allotypes with preeclampsia for example. There are thousands of HLA associations not considered in this section, and to do them justice would likely require an enormous amount of work.
    Thus - neither the idea that HLA/MHC polymorphism is focused on peptide binding nor that this binding drives resistance to infection and associations with the disease are new concepts. The previous work in these areas is inadequately acknowledged.

    The paper is written in a very approachable language, which is nice to read and friendly to non-experts, but perhaps a little too much so in places. I find that the paper follows a very non-traditional format with respect to for example the results section, which seems a mixture of Introduction/methods/figure legends/discussion with no real solid result description.

  3. Reviewer #2 (Public Review):

    Fortier and Pritchard investigated the breadth and depth of trans-species polymorphism (TSP) within six primate classical (antigen-presenting) major histocompatibility complex (MHC) genes (three MHC class I and three MHC class II). The MHC is of wide interest because of its unique evolutionary patterns within the genomes of jawed vertebrates and for its extensive and consistent associations with disease phenotypes. The findings of the paper are:

    1. Trans-species polymorphism (TSP) within major histocompatibility complex (MHC) genes, whereby some alleles are more similar between rather than within species, occurs between humans and non-human primates despite rapid allelic turnover.
    2. Highly polymorphic/rapidly evolving sites are mostly involved in peptide binding.
    3. The identified, rapidly-evolving sites are associated with disease.

    However, because these general findings have been previously demonstrated to varying extents by numerous other studies, these are not the strength of this paper. The strength and importance of this paper are in its utilization of a large evolutionary range of species and genes and its methodological approach and the extent of analyses undertaken to characterize the depth and extent of the TSP among primates. The major contribution of this paper is showing that TSP in the MHC is widespread among diverse primate taxa, and, depending on the particular MHC gene, TSP can be detected between humans and non-human primates as distantly diverged from the human lineage as new world monkeys of the Americas, ~45 million years ago. The paper, overall, made good methodological choices to account for the fascinating but challenging nature of the MHC, which includes its extensive allelic polymorphism (much of which is only characterized for the peptide-binding domain, encoded by exons 2 and 3), the difficulty in assessing phylogenetic relationships (particularly due to recombination and/or interallelic gene conversion), and differentiating convergence from conservation. There is no single analysis that can perfectly account for all these factors. This paper used two methods to test for TSP, Bayesian evolutionary analysis and synonymous nucleotide distances (dS), each with their respective strengths and limitations articulated. TSP, to varying degrees, is supported by both analyses. The paper further identifies rapidly evolving positions within the MHC molecules (predominantly located in the MHC peptide-binding domain), quantitatively shows that they are more likely to be in proximity to the bound peptide within the peptide binding domain, and shows, via a literature review of HLA fine-mapping studies, that those positions are associated with both infectious and autoimmune disease.

    The conclusions of the paper, therefore, are supported and appropriate with the most important caveats noted, but the paper would benefit from:

    1. Addressing how copy number variation of MHC class I genes among primate species might have affected their analyses and results (only single representative genes of the class II MHC, which also exhibit copy number variation, were used for this study).
    2. Considering the differences between class I and class II MHC roles in immune function and how those might relate to the observed patterns.