Coverage landscape of the human genome in nucleus DNA and cell-free DNA

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

For long, genome-wide coverage has been used as a measure of sequencing quality and quantity, but the biology hidden beneath has not been fully exploited. Here we performed comparative analyses on genome-wide coverage profiles between nucleus genome DNA (gDNA) samples from the 1000 Genomes Project (n=3,202) and cell-free DNA (cfDNA) samples from healthy controls (n=113) or cancer patients (n=362). Regardless of sample type, we observed an overall conserved landscape with coverage segmentation, where similar levels of coverage were shared among adjacent windows of genome positions. Besides GC-content, we identified protein-coding gene density and nucleosome density as major factors affecting the coverage of gDNA and cfDNA, respectively. Differential coverage of cfDNA vs gDNA was found in immune-receptor loci, intergenic regions and non-coding genes, reflecting distinct genome activities in different cell types. A further rise in coverage at non-coding genes/intergenic regions and a further drop of coverage at protein-coding genes/genic regions within cancer cfDNA samples suggested a relative loss of contribution by normal cells. Importantly, we observed the distinctive convergence of coverage in cancer-derived cfDNA, with the extent of convergence positively correlated to stages. Based on the findings we developed and validated an outlier-detection approach for cfDNA-based cancer screening without the need of cancer samples for training. The method achieved 97% sensitivity on pediatric sarcomas (n=241) and 44% sensitivity on early-stage lung cancers (n=36) with >90% specificity for condition-matched tasks, 100% sensitivity on late-stage cancers (n=85) for condition-unmatched tasks, outperforming current benchmarks.

Article activity feed