Transferable cancer detection from cell-free DNA fragment lengths through extensions of non-negative matrix factorization

Ludvig Renbo Olsen
Jakob Qvortrup Holsting
Nicolai Juul Birkbak
Lars Dyrskjøt
Jakob Skou Pedersen
Claus Lindbjerg Andersen
Søren Besenbacher

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Motivation

The fragment length distribution of cell-free DNA (cfDNA) can reveal the presence of circulating tumor DNA (ctDNA) in a plasma sample. A previous study documented that non-negative matrix factorization (NMF) can extract relevant features from the fragment length distributions. These distributions, however, are affected by technical biases. When NMF is performed on samples from multiple datasets, some of the extracted signatures often capture these technical biases rather than the actual biological differences.

Results

We present two methods for extracting biologically meaningful NMF signatures across heterogeneous datasets with varying technical biases. Using simulated data, we first demonstrate that these new methods are more effective at estimating the true proportions of the underlying processes. We then show that the methods increase the transferability of fragment length signatures to cfDNA datasets from external labs. Classification models using the two proposed NMF extensions achieve an average cross-dataset AUC of 0.842 and 0.814, compared to 0.776 for standard NMF and 0.688 for a set of previously reported manually selected fragment length features. We further show that one of the proposed methods requires only 1-5 samples to estimate the batch effect during inference.

Availability and Implementation

The source code and instructions on how to run it are available at: https://github.com/BesenbacherLab/batch-NMF

Version published to 10.1101/2025.09.25.25336628 on medRxiv
Sep 27, 2025

Benchmarking Genomic Foundation Models for Gene Fusion Detection from DNA Sequences

This article has 5 authors:
1. Radim Krupička
2. Mariana Komárková
3. Bohuslav Dvorský
4. Kateřina Kollinová
5. Ondřej Klempíř
This article has no evaluationsLatest version Dec 23, 2025
Early prognostication of clinically significant prostate cancer from blood samples via the detection of ultrasound induced release of circulating-tumour cell-specific microRNA combined with clinical imaging

This article has 6 authors:
1. Pradyumna Kedarisetti
2. Joy Wang
3. Ewan McAlister
4. Adam Kinnaird
5. Frank Wuest
6. Roger Zemp
This article has no evaluationsLatest version Dec 12, 2025
Cross-Platform Reproducible Modeling of Breast Cancer Prognosis Using the Core-PAM50 Gene Signature

This article has 2 authors:
1. Rafael de Negreiros Botan
2. Joao Batista de Sousa
This article has no evaluationsLatest version Dec 19, 2025

Discuss this preprint

Listed in

Abstract

Motivation

Results

Availability and Implementation

Article activity feed

Related articles

Benchmarking Genomic Foundation Models for Gene Fusion Detection from DNA Sequences

Early prognostication of clinically significant prostate cancer from blood samples via the detection of ultrasound induced release of circulating-tumour cell-specific microRNA combined with clinical imaging

Cross-Platform Reproducible Modeling of Breast Cancer Prognosis Using the Core-PAM50 Gene Signature