Comprehensive Error Profiling of NovaSeq6000, NovaSeqX, and Salus Pro Using Overlapping Paired-End Reads

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Accurate base calling is essential for ensuring the reliability of next-generation sequencing (NGS) data. However, current approaches to error profiling often rely on reference genomes or variant-calling pipelines, which can introduce platform-specific biases or overlook systematic sequencing errors. Here, we present ErrorProfiler, a reference-free, paired-end overlap-based workflow for quantifying and characterizing base-call errors. We applied this approach to profile three high-throughput sequencing platforms - NovaSeq6000, NovaSeqX, and Salus Pro - using Escherichia coli datasets. By analyzing mismatches within fully overlapping 150 bp read regions, ErrorProfiler effectively distinguished sequencing-specific errors from PCR or library preparation artifacts.

NovaSeqX exhibited the lowest median sequencing error rate (0.089‰), with minimal cycle-dependent drift. Salus PRO followed (0.124‰), outperforming NovaSeq6000 (0.185‰). Base quality score calibration analysis revealed systematic underestimation of base-call accuracy across all platforms. Illumina’s binned Q-score models (Q11/Q25/Q37; Q12/Q24/Q40) masked substantial intra-bin variability, while Salus Pro’s continuous Q-score system offered finer granularity but still overestimated error probabilities in key bins.

A previously unreported “upstream effect” was observed, where the empirical accuracy of a base-call was significantly influenced by the Q-score of its preceding base. This effect was most pronounced in high-Q bins and consistent across all platforms. Substitution error spectra revealed distinct platform-specific and sequence-context-dependent biases, driven by optical systems and detection chemistry.

Our findings provide a rigorous, platform-agnostic framework for sequencing error analysis and quality assessment. ErrorProfiler is open-source and compatible with diverse sequencing platforms, offering a valuable tool for improving calibration, variant calling, and downstream bioinformatics workflows.

Article activity feed