Supervised non-negative matrix factorization on cell-free DNA fragmentomic features enhances early cancer detection

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Background

Cell-free circulating DNA (cfDNA) fragments exhibit non-random patterns in their length (FLEN), end-motif (EM), and distance to nucleosome position (ND). While these cfDNA features have shown promise as inputs for machine learning and deep learning models in early cancer detection, most studies utilize them as raw inputs, overlooking the potential benefits of pre-processing to extract cancer-specific features. This study aims to enhance cancer detection accuracy by developing a novel approach to feature extraction from cfDNA fragmentomics.

Methods

We implemented a supervised non-negative matrix factorization (SNMF) algorithm to generate embedding vectors capturing cancer-specific signals within cfDNA fragmentomic features. These embeddings served as input for a machine learning model to classify cancer patients from healthy individuals.

Results

We validated our framework using two datasets: an in-house cohort of 431 cancer patients and 442 healthy individuals (dataset 1), and a published cohort comprising 90 hepatocellular carcinoma (HCC) patients and 103 individuals with cirrhosis or hepatitis B (dataset 2). In dataset 1, we achieved an AUC of 94% in pan-cancer detection. In dataset 2, our framework achieved an AUC of 100% for HCC vs healthy classification, 99% for HCC vs non-HCC patients classification, and 96% for identifying HCC patients among a mixed group of non-HCC patients and healthy donors.

Conclusion

This study demonstrates the efficiency of SNMF-transformed features in improving both pan-cancer detection and specific HCC detection. Our approach offers a significant advancement in leveraging cfDNA fragmentomics for early cancer detection, potentially enhancing diagnostic accuracy in clinical settings.

Article activity feed