Transferable cancer detection from cell-free DNA fragment lengths through extensions of non-negative matrix factorization

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Motivation

The fragment length distribution of cell-free DNA (cfDNA) can reveal the presence of circulating tumor DNA (ctDNA) in a plasma sample. A previous study documented that non-negative matrix factorization (NMF) can extract relevant features from the fragment length distributions. These distributions, however, are affected by technical biases. When NMF is performed on samples from multiple datasets, some of the extracted signatures often capture these technical biases rather than the actual biological differences.

Results

We present two methods for extracting biologically meaningful NMF signatures across heterogeneous datasets with varying technical biases. Using simulated data, we first demonstrate that these new methods are more effective at estimating the true proportions of the underlying processes. We then show that the methods increase the transferability of fragment length signatures to cfDNA datasets from external labs. Classification models using the two proposed NMF extensions achieve an average cross-dataset AUC of 0.842 and 0.814, compared to 0.776 for standard NMF and 0.688 for a set of previously reported manually selected fragment length features. We further show that one of the proposed methods requires only 1-5 samples to estimate the batch effect during inference.

Availability and Implementation

The source code and instructions on how to run it are available at: https://github.com/BesenbacherLab/batch-NMF

Article activity feed