Fragment end motif analysis to distinguish pathogens from contaminants in enriched plasma microbial DNA

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Introduction

Despite its promise, accuracy of microbial cell-free DNA (mDNA) in plasma as a diagnostic tool is hindered by its low abundance and process contaminants. We have previously shown that combining size selection with single-stranded DNA (ssDNA) library preparation increased mDNA yield by 200-fold but also decreased sensitivity for pathogen detection due to higher background noise. A recent study showed that pathogen-derived DNA was enriched for CC dinucleotide at 5’ ends compared to contaminants. Since ssDNA libraries preserve sequence motifs at both ends (5’ and 3’), we hypothesized that analysis of nucleotide motifs at microbial fragment ends in size-selected ssDNA libraries could help differentiate pathogen DNA from background noise.

Methods

We performed deep sequencing on size-selected ssDNA libraries (<110 bp) generated from longitudinal plasma samples of 11 critically-ill patients (5 with culture-proven infections, 20 samples; 6 without infections, 18 samples) and 6 no-template controls (NTCs). For each 2-mer and 1-mer motif, we calculated the ratio between its frequency observed at 5’ and 3’ fragment ends in sequencing data and its expected frequency in the corresponding reference genome (O/E ratio). We compared enrichment of motifs in pathogen DNA and contaminant DNA fragments.

Results

Pathogen-derived mDNA fragments were more biased in O/E end motif ratios compared to contaminants across all 3 groups (NTCs, no-infections and culture-proven infections), at both 5’ and 3’ fragment ends. Notably, the GG dinucleotide was enriched at the 3’ end in pathogens compared to contaminants (P < 0.0001). Combining O/E ratios for C and G nucleotides at the 3’ end achieved areas under the receiver operating characteristic curve of >0.98 for distinguishing common contaminants from culture-proven pathogens.

Conclusions

Pathogen-derived mDNA in size-selected ssDNA libraries is biased at 5’ and 3’ fragment end compared to contaminants. Incorporating microbial fragment end motif analysis can enhance signal-to-noise ratio and improve pathogen detection and identification in plasma metagenomic sequencing.

Article activity feed