Error Correction in DNA Data Storage Using a PRISMA Guided Systematic Review and an End to End Retrieval Roadmap

Raj Kumar Thapa

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

DNA data storage has progressed from proof-of-concept demonstrations to early system-level workflows, but robust end-to-end retrieval remains constrained by heterogeneous error processes across synthesis, storage, amplification, and sequencing. This paper presents a PRISMA-guided systematic review of 34 studies and introduces a harmonized analytical layer to strengthen cross-study comparability. We formalize channel error representation using substitution, insertion, and deletion components and apply a CRL (Coding overhead–Retrieval reliability–Latency proxy) framework that compares methods by coding overhead, retrieval reliability, and computational burden proxy. The synthesis shows that substitution-dominant conditions are comparatively tractable with established coding and consensus approaches, whereas insertion/deletion-driven synchronization loss remains the main unresolved barrier, especially in long-read and nanopore-oriented settings. To support deployment decisions, we provide an evidence-to-design workflow that maps error regimes and resource constraints to candidate coding/decoding stacks. We also identify persistent reporting gaps, including inconsistent disclosure of sequencing depth, runtime, and decoding conditions, which limit reproducibility and practical benchmarking. This review contributes a reproducible, implementation-oriented roadmap that moves beyond narrative synthesis toward benchmark-ready method selection for real-world DNA storage systems. [1–34]

Version published to 10.21203/rs.3.rs-8924911/v1 on Research Square
Mar 24, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed