A reference-free strategy for circulating tumor DNA detection from whole-genome sequencing data

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Circulating tumor DNA (ctDNA) is emerging as a promising biomarker for postoperative monitoring of cancer patients. Precise estimation of circulating tumor fraction is crucial for evaluating treatment effects and timely detection of disease recurrence. All current ctDNA detection methods that utilize whole-genome sequencing (WGS) data rely on the reference genome alignment of sequencing reads and often apply separate tools for detecting different variant types. However, various bioinformatic analysis confounders and the application of external variant calling tools could be avoided by analyzing k-mers from unaligned sequencing reads. While k-mer-based methods have successfully been applied for somatic variant validation and detection, the potential of k-mer-based ctDNA detection is unexplored. We have developed a tumor-informed reference-free ctDNA detection tool called ctDNAmer that detects tumor-specific somatic variation directly from unaligned sequencing data by identifying k-mers unique to the tumor DNA. ctDNAmer detects variant information across the genome by comparing the primary tumor and germline WGS data and accounts for sample-specific germline variability and technical noise in the same framework. We tested the utility of ctDNAmer for tumor fraction estimation on postoperative plasma cfDNA WGS data (mean sequencing depth ~28x) from 90 stage III colorectal cancer patients with three years of follow-up. The tumor fraction (TF) estimates agreed with the available clinical information and ctDNA was detected in 77% (17/22) of recurring patients with a median lead time of 8 months compared to radiological imaging. We further validated ctDNAmer’s tumor fraction estimates based on a comparison with the mean cfDNA allele frequencies of somatic clonal SNVs identified from aligned primary tumor sequencing data. The TF estimates showed a strong Pearson correlation of 0.897 with the mean allele frequencies and improved ctDNA detection results across samples with an AUC of 0.79 compared to 0.75 if the mean allele frequency of clonal mutations is used.

Article activity feed