Transferable cancer detection from cell-free DNA fragment lengths through extensions of non-negative matrix factorization
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Motivation
The fragment length distribution of cell-free DNA (cfDNA) can reveal the presence of circulating tumor DNA (ctDNA) in a plasma sample. A previous study documented that non-negative matrix factorization (NMF) can extract relevant features from the fragment length distributions. These distributions, however, are affected by technical biases. When NMF is performed on samples from multiple datasets, some of the extracted signatures often capture these technical biases rather than the actual biological differences.
Results
We present two methods for extracting biologically meaningful NMF signatures across heterogeneous datasets with varying technical biases. Using simulated data, we first demonstrate that these new methods are more effective at estimating the true proportions of the underlying processes. We then show that the methods increase the transferability of fragment length signatures to cfDNA datasets from external labs. Classification models using the two proposed NMF extensions achieve an average cross-dataset AUC of 0.842 and 0.814, compared to 0.776 for standard NMF and 0.688 for a set of previously reported manually selected fragment length features. We further show that one of the proposed methods requires only 1-5 samples to estimate the batch effect during inference.
Availability and Implementation
The source code and instructions on how to run it are available at: https://github.com/BesenbacherLab/batch-NMF