Structural variation, selection, and diversification of the NPIP gene family from the human pangenome
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
The NPIP (nuclear pore interacting protein) gene family has expanded to high copy number in humans and African apes where it has been subject to an excess of amino acid replacement consistent with positive selection (1). Due to the limitations of short-read sequencing, NPIP human genetic diversity has been poorly understood. Using highly accurate assemblies generated from long-read sequencing as part of the human pangenome, we completely characterize 169 human haplotypes (4,665 NPIP paralogs and alleles). Of the 28 NPIP paralogs, just three ( NPIPB2 , B11 , and B14 ) are fixed at a single copy, and only a single locus, B2 , shows no structural variation. Four NPIP paralogs map to large segmental duplication blocks that mediate polymorphic inversions (355 kbp–1.6 Mbp) corresponding to microdeletions associated with developmental delay and autism. Haplotype-based tests of positive selection and selective sweeps identify two paralogs, B9 and B15 , within the top percentile for both tests. Using full-length cDNA data from 101 tissue/cell types, we construct paralog-specific gene models and show that 56% (31/55 most abundant isoforms) have not been previously described in RefSeq. We define six distinct translation start sites and other protein structural features that distinguish paralogs, including a variable number tandem repeat that encodes a beta helix of variable size that emerged ∼3.1 million years ago in human evolution. Among the 28 NPIP paralogs, we identify distinct tissue and developmental patterns of expression with only a few maintaining the ancestral testis-enriched expression. A subset of paralogs ( NPIPA1 , A5 , A6-9 , B3-5 , and B12/B13 ) show increased brain expression. Our results suggest ongoing positive selection in the human population and rapid diversification of NPIP gene models.