Error Correction in DNA Data Storage Using a PRISMA Guided Systematic Review and an End to End Retrieval Roadmap
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
DNA data storage has progressed from proof-of-concept demonstrations to early system-level workflows, but robust end-to-end retrieval remains constrained by heterogeneous error processes across synthesis, storage, amplification, and sequencing. This paper presents a PRISMA-guided systematic review of 34 studies and introduces a harmonized analytical layer to strengthen cross-study comparability. We formalize channel error representation using substitution, insertion, and deletion components and apply a CRL (Coding overhead–Retrieval reliability–Latency proxy) framework that compares methods by coding overhead, retrieval reliability, and computational burden proxy. The synthesis shows that substitution-dominant conditions are comparatively tractable with established coding and consensus approaches, whereas insertion/deletion-driven synchronization loss remains the main unresolved barrier, especially in long-read and nanopore-oriented settings. To support deployment decisions, we provide an evidence-to-design workflow that maps error regimes and resource constraints to candidate coding/decoding stacks. We also identify persistent reporting gaps, including inconsistent disclosure of sequencing depth, runtime, and decoding conditions, which limit reproducibility and practical benchmarking. This review contributes a reproducible, implementation-oriented roadmap that moves beyond narrative synthesis toward benchmark-ready method selection for real-world DNA storage systems. [1–34]