Characterizing cytosine methylation of polymorphic human transposable element insertions using human pangenome resources
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Cytosine methylation is an important epigenetic modification that plays a crucial role in genomic regulation. The conventional second generation sequencing based bisulfite conversion methods used to interrogate cytosine methylation require comparing bisulfited-treated reads with a reference genome for methylation calling. Therefore, it cannot characterize the methylation of unaligned regions. Taking advantage of the recent improvement of the third generation sequencing, we investigated the methylation pattern of human lymphoblastoid cell lines (LCLs) of non-reference insertions from the human population, with a focus on polymorphic transposable elements. We first characterized whole-genome CpG methylation using both SMRT and nanopore technology and benchmarked their performance against WGBS using five human LCLs included in the draft Human Pangenome Reference. Both methods are highly correlated with the conventional WGBS results across the genome. The level of differences between PacBio and ONT on the same sample is comparable with that of two replicates of WGBS of the same sample. Using long-read data from the draft Human Pangenome Reference, we characterized CpG methylation of non-reference insertions, especially polymorphic transposable elements. We focused on addressing two questions: 1) do newly inserted TEs adopt the methylation pattern of their genomic context? and 2) do methylation spread from new TE insertions to their flanking regions? We found that most non-TE insertions exhibit DNA methylation pattern consistent with their genomic context, but TE insertions are consistently methylated, with a few exceptions. We also found limited methylation spreading from Alu/L1 insertions to their flanking genomic regions. We investigated INDEL frequency in both hypermethylated and hypomethylated CpG islands and found INDELs are enriched in hypermethylated CpG islands. Our work demonstrated the methylation calling capability of the 3rd generation sequencing and its unique advantage in characterizing epigenomic features within non-reference positions.