Unsupervised detection of fragment length signatures of circulating tumor DNA using non-negative matrix factorization

Curation statements for this article:
  • Curated by eLife

    eLife logo

    Evaluation Summary:

    This work describes a shallow WGS sequencing approach (<1x) for genome‐wide cell‐free DNA fragmentation analysis using non-negative matrix factorization (NMF). The concept of utilizing 'fragmentomics' as an early cancer detection tool is a subject of intense investigation and there have been multiple recent publications in the field. This work adds an interesting unsupervised approach.

    (This preprint has been reviewed by eLife. We include the public reviews from the reviewers here; the authors also receive private feedback with suggested changes to the manuscript. Reviewer #1 agreed to share their name with the authors.)

This article has been Reviewed by the following groups

Read the full article See related articles

Abstract

Sequencing of cell-free DNA (cfDNA) is currently being used to detect cancer by searching both for mutational and non-mutational alterations. Recent work has shown that the length distribution of cfDNA fragments from a cancer patient can inform tumor load and type. Here, we propose non-negative matrix factorization (NMF) of fragment length distributions as a novel and completely unsupervised method for studying fragment length patterns in cfDNA. Using shallow whole-genome sequencing (sWGS) of cfDNA from a cohort of patients with metastatic castration-resistant prostate cancer (mCRPC), we demonstrate how NMF accurately infers the true tumor fragment length distribution as an NMF component - and that the sample weights of this component correlate with ctDNA levels ( r =0.75). We further demonstrate how using several NMF components enables accurate cancer detection on data from various early stage cancers (AUC = 0.96). Finally, we show that NMF, when applied across genomic regions, can be used to discover fragment length signatures associated with open chromatin.

Article activity feed

  1. Evaluation Summary:

    This work describes a shallow WGS sequencing approach (<1x) for genome‐wide cell‐free DNA fragmentation analysis using non-negative matrix factorization (NMF). The concept of utilizing 'fragmentomics' as an early cancer detection tool is a subject of intense investigation and there have been multiple recent publications in the field. This work adds an interesting unsupervised approach.

    (This preprint has been reviewed by eLife. We include the public reviews from the reviewers here; the authors also receive private feedback with suggested changes to the manuscript. Reviewer #1 agreed to share their name with the authors.)

  2. Reviewer #1 (Public Review):

    Renaud et al aims at developing a new computational method based on NMF relying on the difference in the fragment size profiles of cfDNA extracted from plasma of healthy individuals and metastatic castration-resistant prostate cancer patients. The manuscript is clear and the drawings very helpful.

  3. Reviewer #2 (Public Review):

    Renaud et al performed the fragment length signature analysis for circulating tumor DNA using non-negative factorization (NMF). The authors attempted to demonstrate how NFM accurately inferred the true tumor fragment length distribution based on an NMF component. The sample weights of this component correlated with cfDNA level (r = 0.75). This method could potentially remove the requirement of genetic variations (e.g. somatic mutations or copy number aberrations) for tumor DNA loading estimation. Nonetheless, cancer detection analysis in this study should be interpreted as a supervised algorithm even though NMF is an unsupervised method, because the authors used a support vector machine (SVM, a supervised algorithm) to evaluate the classification power between cancer cases and controls.

  4. Reviewer #3 (Public Review):

    The presented approach utilizes fragment size distribution to classify cancer patients. Although the observation that a large part of mutant fragment is smaller than the respective wild type fragments in not novel, it is certainly of value to apply novel methods to multiple cohorts. Biologically, the findings of this study seem to largely support previous insights on cfDNA fragment sizes, but com-pared to other studies leveraging fragmentomics, an important advance presented in this manu-script is the use of an unsupervised approach. Yet the cohorts used in this study design seem to limit the ability to gain a clear clinical statement. Using mCRPC samples two fragment length signa-tures (tumor, non-tumor) were extracted, that correlated with tumor fractions in plasma. Although the authors demonstrate that subsampling experiments revealed that NMF was more robust when less data is available than the ichorCNA algorithm, they lacked to provide convincing data that NMF is more sensitive with respect to tumor fraction. Moreover, when comparing targeted fragment sizes from targeted NGS data with the NMF, the authors did not correct for variants related to clon-al hematopoiesis. With respect to cancer detection their approach did not outperform previous approaches.