mclUMI: Markov clustering of unique molecular identifiers enables dynamic removal of PCR duplicates
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Molecular quantification in high-throughput sequencing experiments relies on accurate identification and removal of polymerase chain reaction (PCR) duplicates. The use of Unique Molecular Identifiers (UMIs) in sequencing protocols has become a standard approach for distinguishing molecular identities. However, PCR artefacts and sequencing errors in UMIs present a significant challenge for effective UMI collapsing and accurate molecular counting. Current computational strategies for UMI collapsing often exhibit limited flexibility, providing invariable deduplicated counts that inadequately adapt to varying experimental conditions. To address these limitations, we developed mclUMI, a tool employing the Markov clustering algorithm to accurately identify original UMIs and eliminate PCR duplicates. Unlike conventional methods, mclUMI automates the detection of independent communities within UMI graphs by dynamically fine-tuning inflation and expansion parameters, enabling context-dependent merging of UMIs based on their connectivity patterns. Through in silico experiments, we demonstrate that mclUMI generates dynamically adaptable deduplication outcomes tailored to diverse experimental scenarios, particularly best-performing under high sequencing error rates. By integrating connectivity-driven clustering, mclUMI enhances the accuracy of molecular counting in noisy sequencing environments, addressing the rigidity of current UMI deduplication frameworks.