Genome-wide characterization of clonal hematopoiesis reveals extensive non-coding putative driver mutations
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
As humans age, we acquire somatic mutations in our blood, leading to clonal hematopoiesis (CH). Despite the prevalence of CH in aged individuals, recent searches for selective sweeps in single-cell derived colonies have revealed that most clones have expanded without a known driver mutation. This extensive, unexplained CH motivated our search for novel driver mutations across ∼490K blood whole genome sequences from the UK Biobank. We searched across variants with a minor allele count of at least 10 (∼147M variants) to discover alleles that are enriched in aged individuals. We identified 45 variants including known CH driver genes (e.g., DNMT3A , ASXL1, SF3B1 ) and 35 novel variants.
Among the novel age-associated variants, we identified 72 carriers of a somatic mutation in the TERT promoter. Although previously reported in pan-cancer analyses, TERT promoter mutations have typically been excluded from population searches for CH. We also observed a somatic intronic insertion in UGT2B7 in 1,165 carriers, a cluster of IGH point mutations, and centromeric variation. We conducted a phenome-wide association study (PheWAS) among 30 common disease phenotypes to characterize the phenotypic correlates of these mutations, finding 965 links between somatic mutations and common diseases, including 37 protective associations. After estimating the total liability scale variance explained of CH mutations on each common disease, we found that non-canonical CH contributed 28% of the variance explained. We then performed a genome-wide association study of the IGH mutations, finding that common germline variation at GRAMD1B is strongly associated with IGH mutations, and finally a proteome-wide association study to characterize the plasma proteomic correlates of CH. Overall, we characterize CH at both increased breadth and resolution and characterize the entire cascade from upstream germline risk haplotypes to downstream clinical correlates. We release our summary statistics in a publicly accessible portal, somatic.emory.edu.