Helicobacter pylori Genome Aggregation Database reveals complex evolutionary forces shaping its genomic landscape and clinical impact in East Asia
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
We assembled HPgnomAD, a comprehensive global genome aggregation database of Helicobacter pylori (H. pylori) featuring 7,544 high-quality genomes from 278 populations, along with the first species-wide haplotype reference panel for genotype imputation. The panel provides high accuracy across diverse H. pylori populations, including those from low-coverage data, and is openly accessible with integrated analysis tools (https://www.hpgnomad.top/). Variant discovery revealed 1.82 million SNPs and 0.65 million InDels, with African strains exhibiting the most tremendous diversity and East Asian strains showing unexpectedly high novel variation. Fine-scale analysis of the hpEAsia lineage revealed six new sublineages shaped by altitude-, latitude-related divergence and region-specific admixture. Phylogenomic dating revealed two divergence waves in East Asia, paralleling Upper Paleolithic settlement and Neolithic human expansions, with highland–lowland separation at ~18.8 kya, followed by the formation of complex geography-related population substructures. Genome-wide scans revealed adaptive loci related to metal acquisition, nitrogen metabolism, surface adhesion, and membrane transport, including altitude-associated highly differentiated variants linked to antibiotic resistance and gastric disease. By integrating genomic, evolutionary, environmental, and clinical data, HPgnomAD offers a framework for understanding H. pylori evolution, host‒pathogen adaptation, and precision medicine worldwide.