Coverage landscape of the human genome in nucleus DNA and cell-free DNA
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
For long, genome-wide coverage has been used as a measure of sequencing quality and quantity, but the biology hidden beneath has not been fully exploited. Here we performed comparative analyses on genome-wide coverage profiles between nucleus genome DNA (gDNA) samples from the 1000 Genomes Project (n=3,202) and cell-free DNA (cfDNA) samples from healthy controls (n=113) or cancer patients (n=362). Regardless of sample type, we observed an overall conserved landscape with coverage segmentation, where similar levels of coverage were shared among adjacent windows of genome positions. Besides GC-content, we identified protein-coding gene density and nucleosome density as major factors affecting the coverage of gDNA and cfDNA, respectively. Differential coverage of cfDNA vs gDNA was found in immune-receptor loci, intergenic regions and non-coding genes, reflecting distinct genome activities in different cell types. A further rise in coverage at non-coding genes/intergenic regions and a further drop of coverage at protein-coding genes/genic regions within cancer cfDNA samples suggested a relative loss of contribution by normal cells. Importantly, we observed the distinctive convergence of coverage in cancer-derived cfDNA, with the extent of convergence positively correlated to stages. Based on the findings we developed and validated an outlier-detection approach for cfDNA-based cancer screening without the need of cancer samples for training. The method achieved 97% sensitivity on pediatric sarcomas (n=241) and 44% sensitivity on early-stage lung cancers (n=36) with >90% specificity for condition-matched tasks, 100% sensitivity on late-stage cancers (n=85) for condition-unmatched tasks, outperforming current benchmarks.